LGOct 1, 2021

A survey on datasets for fairness-aware machine learning

arXiv:2110.00530v3348 citations
Originality Synthesis-oriented
AI Analysis

This is an incremental survey that helps researchers evaluate fairness-aware ML methods by providing insights into benchmark datasets.

The paper surveys real-world datasets used for fairness-aware machine learning, focusing on tabular data, and analyzes attribute relationships using Bayesian networks and exploratory analysis to understand bias.

As decision-making increasingly relies on Machine Learning (ML) and (big) data, the issue of fairness in data-driven Artificial Intelligence (AI) systems is receiving increasing attention from both research and industry. A large variety of fairness-aware machine learning solutions have been proposed which involve fairness-related interventions in the data, learning algorithms and/or model outputs. However, a vital part of proposing new approaches is evaluating them empirically on benchmark datasets that represent realistic and diverse settings. Therefore, in this paper, we overview real-world datasets used for fairness-aware machine learning. We focus on tabular data as the most common data representation for fairness-aware machine learning. We start our analysis by identifying relationships between the different attributes, particularly w.r.t. protected attributes and class attribute, using a Bayesian network. For a deeper understanding of bias in the datasets, we investigate the interesting relationships using exploratory analysis.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes