LGAICYMLJul 3, 2024

FairJob: A Real-World Dataset for Fairness in Online Systems

arXiv:2407.03059v26 citationsh-index: 6Has Code
AI Analysis

This work provides a dataset for researchers and practitioners to study fairness in high-impact domains like advertising, where balancing fairness and utility is a critical industrial challenge.

The authors introduced a fairness-aware dataset for job recommendations in advertising to address the lack of resources for algorithmic fairness research in real-world scenarios, and they demonstrated potential improvements in fairness with trade-offs in utility through experimental evaluations.

We introduce a fairness-aware dataset for job recommendations in advertising, designed to foster research in algorithmic fairness within real-world scenarios. It was collected and prepared to comply with privacy standards and business confidentiality. An additional challenge is the lack of access to protected user attributes such as gender, for which we propose a solution to obtain a proxy estimate. Despite being anonymized and including a proxy for a sensitive attribute, our dataset preserves predictive power and maintains a realistic and challenging benchmark. This dataset addresses a significant gap in the availability of fairness-focused resources for high-impact domains like advertising -- the actual impact being having access or not to precious employment opportunities, where balancing fairness and utility is a common industrial challenge. We also explore various stages in the advertising process where unfairness can occur and introduce a method to compute a fair utility metric for the job recommendations in online systems case from a biased dataset. Experimental evaluations of bias mitigation techniques on the released dataset demonstrate potential improvements in fairness and the associated trade-offs with utility. The dataset is hosted at https://huggingface.co/datasets/criteo/FairJob. Source code for the experiments is hosted at https://github.com/criteo-research/FairJob-dataset/.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes