CS 335: Fair, Accountable, and Transparent (FAccT) Deep Learning
Datasets and Project Ideas
We have prepared a few datasets related to our course. For many of them we were also able to find research papers that analyze these datasets. You are welcome to develop your course projects based on these datasets or other datasets that you can find.
1. Search Engine, Retrieval and Online Advertisement
Search Engine Dataset
Fairness: Mustafaraj, Eni, Emma Lurie, and Claire Devine. The case for voter-centered audits of search engines during political elections, FAT 2020.
2. NLP
Wikipedia Talk dataset
Dataset: Ellery Wulczyn, Nithum Thain, and Lucas Dixon. 2017. Ex machina: Personal attacks seen at scale. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1391–1399
Fairness: Sweeney, Chris, and Maryam Najafian. Reducing sentiment polarity for demographic attributes in word embeddings using adversarial learning, FAT 2020
3. Education
IIT-JEE entrance exam
Dataset Description: Harold Alderman and Elizabeth M. King. Gender differences in parental investment in education. Structural Change and Economic Dynamics, 9(4):453–468, 1998.
Fairness: Celis, L. Elisa, Anay Mehrotra, and Nisheeth K. Vishnoi. Interventions for ranking in the presence of implicit bias, FAT 2020
4. Finance
FICO dataset
Dataset Description: US Federal Reserve. Report to the congress on credit scoring and its effects on the availability and affordability of credit, 2007.
Interpretability: Liu, Lydia T., et al. The Disparate Equilibria of Algorithmic Decision Making when Individuals Invest Rationally, FAT 2020
5. Recommendation
MovieLens Dataset
Fairness: Dean, Sarah, Sarah Rich, and Benjamin Recht. Recommendations and User Agency: The Reachability of Collaboratively-Filtered Information, FAT 2020
6. Reinforcement Learning Environment
Ml-fairness-gym
Fairness: D'Amour, Alexander, et al. Fairness is not static: deeper understanding of long term fairness via simulation studies, FAT 2020
7. Laws and Society
COMPAS for Bail Decision
Dataset Description: Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks
Interpretability: Mothilal, Ramaravind Kommiya, Amit Sharma, and Chenhao Tan. Explaining machine learning classifiers through diverse counterfactual explanations, FAT 2020
Bike Rentals
Face Classification
Fairness: Buolamwini, Joy, and Timnit Gebru. Gender shades: Intersectional accuracy disparities in commercial gender classification, FAT 2018
8. Healthcare
|