CS 335: Fair, Accountable, and Transparent (FAccT) Deep Learning

Datasets and Project Ideas

We have prepared a few datasets related to our course. For many of them we were also able to find research papers that analyze these datasets. You are welcome to develop your course projects based on these datasets or other datasets that you can find.

1. Search Engine, Retrieval and Online Advertisement

Yahoo! A1 Search Marketing Advertiser Bidding Dataset
- Fairness: Nasr, Milad, and Michael Tschantz. Bidding Strategies with Gender Nondiscrimination: Constraints for Online Ad Auctions, FAT 2020

Search Engine Dataset
- Fairness: Mustafaraj, Eni, Emma Lurie, and Claire Devine. The case for voter-centered audits of search engines during political elections, FAT 2020.

Web Track Clueweb09 TREC Test Collection for Retrieval
- Interpretability: Singh, Jaspreet, and Avishek Anand. Model agnostic interpretability of rankers via intent modelling, FAT 2020

YouTube Spam Comments (Text Classification)
- Link: dt.fee.unicamp.br
- Dataset Description: Alberto, Túlio C, Johannes V Lochter, and Tiago A Almeida. Tubespam: comment spam filtering on YouTube. In Machine Learning and Applications (Icmla), Ieee 14th International Conference on, 138–43. IEEE. 2015

2. NLP

Wikipedia Talk dataset
- Dataset: Ellery Wulczyn, Nithum Thain, and Lucas Dixon. 2017. Ex machina: Personal attacks seen at scale. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1391–1399
- Fairness: Sweeney, Chris, and Maryam Najafian. Reducing sentiment polarity for demographic attributes in word embeddings using adversarial learning, FAT 2020

SemEval-2018 Task 1 Affect in Tweets
- Dataset Description: Saif Mohammad, Felipe Bravo-Marquez, Mohammad Salameh, and Svetlana Kiritchenko. 2018. SemEval-2018 Task 1: Affect in Tweets. In Proceedings of The 12th International Workshop on Semantic Evaluation. Association for Computational Linguistics, 1–17.
- Fairness: Sweeney, Chris, and Maryam Najafian. Reducing sentiment polarity for demographic attributes in word embeddings using adversarial learning, FAT 2020

Stanford Sentiment Treebank
- Dataset Description: Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA, October 2013. Association for Computational Linguistics.

Yelp challenge 2013
- Link: yelp.com

Amazon Multi-Domain Sentiment dataset
- Dataset Description: John Blitzer, Mark Dredze, and Fernando Pereira. Biographies, Bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 440–447, Prague, Czech Republic, June 2007. Association for Computational Linguistics.
- Accountability: Mena, José, Oriol Pujol, and Jordi Vitrià. Dirichlet uncertainty wrappers for actionable algorithm accuracy accountability and auditability, FAT 2020

3. Education

IIT-JEE entrance exam
- Dataset Description: Harold Alderman and Elizabeth M. King. Gender differences in parental investment in education. Structural Change and Economic Dynamics, 9(4):453–468, 1998.
- Fairness: Celis, L. Elisa, Anay Mehrotra, and Nisheeth K. Vishnoi. Interventions for ranking in the presence of implicit bias, FAT 2020

Semantic Scholar Open Research Corpus
- Fairness: Celis, L. Elisa, Anay Mehrotra, and Nisheeth K. Vishnoi. Interventions for ranking in the presence of implicit bias, FAT 2020

4. Finance

FICO dataset
- Dataset Description: US Federal Reserve. Report to the congress on credit scoring and its effects on the availability and affordability of credit, 2007.
- Interpretability: Liu, Lydia T., et al. The Disparate Equilibria of Algorithmic Decision Making when Individuals Invest Rationally, FAT 2020

German-Credit for assessing credit risk
- interpretability: Mothilal, Ramaravind Kommiya, Amit Sharma, and Chenhao Tan. Explaining machine learning classifiers through diverse counterfactual explanations, FAT 2020

Lending Club for loan decisions
- Link: lendingclub.com
- interpretability: Mothilal, Ramaravind Kommiya, Amit Sharma, and Chenhao Tan. Explaining machine learning classifiers through diverse counterfactual explanations, FAT 2020

5. Recommendation

MovieLens Dataset
- Fairness: Dean, Sarah, Sarah Rich, and Benjamin Recht. Recommendations and User Agency: The Reachability of Collaboratively-Filtered Information, FAT 2020

6. Reinforcement Learning Environment

Ml-fairness-gym
- Fairness: D'Amour, Alexander, et al. Fairness is not static: deeper understanding of long term fairness via simulation studies, FAT 2020

7. Laws and Society

COMPAS for Bail Decision
- Dataset Description: Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks
- Interpretability: Mothilal, Ramaravind Kommiya, Amit Sharma, and Chenhao Tan. Explaining machine learning classifiers through diverse counterfactual explanations, FAT 2020

Adult-Income for Income Prediction
- Link: UCI ML Repository
- Dataset Description: Ronny Kohavi and Barry Becker. 1996. UCI Machine Learning Repository.
- Interpretability: Mothilal, Ramaravind Kommiya, Amit Sharma, and Chenhao Tan. Explaining machine learning classifiers through diverse counterfactual explanations, FAT 2020

Bike Rentals
- Link: UCI ML Repository
Face Classification
- Fairness: Buolamwini, Joy, and Timnit Gebru. Gender shades: Intersectional accuracy disparities in commercial gender classification, FAT 2018

8. Healthcare

Risk Factors for Cervical Cancer
- Link: UCI ML Repository
- Dataset Description: Fernandes, Kelwin, Jaime S Cardoso, and Jessica Fernandes. “Transfer learning with partial observability applied to cervical cancer screening.” In Iberian Conference on Pattern Recognition and Image Analysis, 243–50. Springer. (2017)

Multiparameter Intelligent Monitoring In Intensive CareIII (MIMIC-III) database
- Dataset Description: Alistair EW Johnson, Tom J Pollard, Lu Shen, H Lehman Li-wei, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016. MIMIC-III, a freely accessible critical care database. Scientific data 3 (2016), 160035.
- Interpretability: Hancox-Li, Leif. Robustness in Machine Learning Explanations: Does It Matter?, FAT 2020