CLAIMay 17, 2023

Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection

arXiv:2305.10204v1230 citations
Originality Incremental advance
AI Analysis

This addresses bias in NLP models for fairness applications, though it is incremental as it extends beyond linear methods.

The paper tackled the problem of neural models encoding social biases by proposing Iterative Gradient-Based Projection (IGBP) to remove non-linear encoded sensitive attributes like gender and race, resulting in effective bias mitigation with minimal impact on downstream task accuracy.

Natural language processing models tend to learn and encode social biases present in the data. One popular approach for addressing such biases is to eliminate encoded information from the model's representations. However, current methods are restricted to removing only linearly encoded information. In this work, we propose Iterative Gradient-Based Projection (IGBP), a novel method for removing non-linear encoded concepts from neural representations. Our method consists of iteratively training neural classifiers to predict a particular attribute we seek to eliminate, followed by a projection of the representation on a hypersurface, such that the classifiers become oblivious to the target attribute. We evaluate the effectiveness of our method on the task of removing gender and race information as sensitive attributes. Our results demonstrate that IGBP is effective in mitigating bias through intrinsic and extrinsic evaluations, with minimal impact on downstream task accuracy.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes