LGOct 8, 2023

Information-Theoretic Bounds on The Removal of Attribute-Specific Bias From Neural Networks

arXiv:2310.04955v26 citationsh-index: 10
Originality Incremental advance
AI Analysis

This work highlights a critical limitation in fairness methods for AI, cautioning against their use in small datasets with strong bias.

The paper tackles the problem of removing attribute-specific bias from neural networks, revealing that existing methods are only effective when dataset bias is weak, with performance bounded by bias strength.

Ensuring a neural network is not relying on protected attributes (e.g., race, sex, age) for predictions is crucial in advancing fair and trustworthy AI. While several promising methods for removing attribute bias in neural networks have been proposed, their limitations remain under-explored. In this work, we mathematically and empirically reveal an important limitation of attribute bias removal methods in presence of strong bias. Specifically, we derive a general non-vacuous information-theoretical upper bound on the performance of any attribute bias removal method in terms of the bias strength. We provide extensive experiments on synthetic, image, and census datasets to verify the theoretical bound and its consequences in practice. Our findings show that existing attribute bias removal methods are effective only when the inherent bias in the dataset is relatively weak, thus cautioning against the use of these methods in smaller datasets where strong attribute bias can occur, and advocating the need for methods that can overcome this limitation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes