LGCYJun 10, 2024

A Taxonomy of Challenges to Curating Fair Datasets

arXiv:2406.06407v26 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of improving fairness in ML datasets for researchers and practitioners, but it is incremental as it focuses on taxonomy and recommendations rather than novel methods.

The paper tackles the limited understanding of practical aspects in curating fair machine learning datasets by presenting a taxonomy of challenges and trade-offs based on interviews with 30 dataset curators, resulting in recommendations for systemic changes.

Despite extensive efforts to create fairer machine learning (ML) datasets, there remains a limited understanding of the practical aspects of dataset curation. Drawing from interviews with 30 ML dataset curators, we present a comprehensive taxonomy of the challenges and trade-offs encountered throughout the dataset curation lifecycle. Our findings underscore overarching issues within the broader fairness landscape that impact data curation. We conclude with recommendations aimed at fostering systemic changes to better facilitate fair dataset curation practices.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes