LGCYDBJan 18, 2021

Through the Data Management Lens: Experimental Analysis and Evaluation of Fair Classification

arXiv:2101.07361v432 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for systematic evaluation of fair classification methods to guide practitioners in critical decision-making systems, though it is incremental as it focuses on comparative analysis rather than introducing new methods.

The paper tackles the problem of fair classification by conducting a broad experimental analysis of 13 approaches and variants, evaluating them on correctness, fairness, efficiency, and other metrics using real-world datasets, and providing insights on performance impacts and practical selection principles.

Classification, a heavily-studied data-driven machine learning task, drives an increasing number of prediction systems involving critical human decisions such as loan approval and criminal risk assessment. However, classifiers often demonstrate discriminatory behavior, especially when presented with biased data. Consequently, fairness in classification has emerged as a high-priority research area. Data management research is showing an increasing presence and interest in topics related to data and algorithmic fairness, including the topic of fair classification. The interdisciplinary efforts in fair classification, with machine learning research having the largest presence, have resulted in a large number of fairness notions and a wide range of approaches that have not been systematically evaluated and compared. In this paper, we contribute a broad analysis of 13 fair classification approaches and additional variants, over their correctness, fairness, efficiency, scalability, robustness to data errors, sensitivity to underlying ML model, data efficiency, and stability using a variety of metrics and real-world datasets. Our analysis highlights novel insights on the impact of different metrics and high-level approach characteristics on different aspects of performance. We also discuss general principles for choosing approaches suitable for different practical settings, and identify areas where data-management-centric solutions are likely to have the most impact.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes