10 Years of Fair Representations: Challenges and Opportunities
This work critically assesses a foundational fairness technique, revealing significant vulnerabilities that could mislead practitioners into deploying seemingly fair but biased models.
The paper revisits Fair Representation Learning (FRL) after ten years, showing through theoretical analysis and large-scale experiments (225,000 model fits) that deterministic, unquantized FRL methods struggle to remove sensitive information, with AutoML-based adversarial attacks exposing these limitations.
Fair Representation Learning (FRL) is a broad set of techniques, mostly based on neural networks, that seeks to learn new representations of data in which sensitive or undesired information has been removed. Methodologically, FRL was pioneered by Richard Zemel et al. about ten years ago. The basic concepts, objectives and evaluation strategies for FRL methodologies remain unchanged to this day. In this paper, we look back at the first ten years of FRL by i) revisiting its theoretical standing in light of recent work in deep learning theory that shows the hardness of removing information in neural network representations and ii) presenting the results of a massive experimentation (225.000 model fits and 110.000 AutoML fits) we conducted with the objective of improving on the common evaluation scenario for FRL. More specifically, we use automated machine learning (AutoML) to adversarially "mine" sensitive information from supposedly fair representations. Our theoretical and experimental analysis suggests that deterministic, unquantized FRL methodologies have serious issues in removing sensitive information, which is especially troubling as they might seem "fair" at first glance.