CVApr 16, 2019

REPAIR: Removing Representation Bias by Dataset Resampling

arXiv:1904.07911v1308 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the issue of biased datasets affecting model generalization for researchers and practitioners in machine learning, though it is incremental as it builds on existing bias reduction methods.

The paper tackles the problem of representation bias in machine learning datasets, where models exploit dataset biases instead of learning the underlying task, and proposes the REPAIR algorithm to reduce this bias through dataset resampling, leading to improved generalization in experiments with synthetic and action recognition data.

Modern machine learning datasets can have biases for certain representations that are leveraged by algorithms to achieve high performance without learning to solve the underlying task. This problem is referred to as "representation bias". The question of how to reduce the representation biases of a dataset is investigated and a new dataset REPresentAtion bIas Removal (REPAIR) procedure is proposed. This formulates bias minimization as an optimization problem, seeking a weight distribution that penalizes examples easy for a classifier built on a given feature representation. Bias reduction is then equated to maximizing the ratio between the classification loss on the reweighted dataset and the uncertainty of the ground-truth class labels. This is a minimax problem that REPAIR solves by alternatingly updating classifier parameters and dataset resampling weights, using stochastic gradient descent. An experimental set-up is also introduced to measure the bias of any dataset for a given representation, and the impact of this bias on the performance of recognition models. Experiments with synthetic and action recognition data show that dataset REPAIR can significantly reduce representation bias, and lead to improved generalization of models trained on REPAIRed datasets. The tools used for characterizing representation bias, and the proposed dataset REPAIR algorithm, are available at https://github.com/JerryYLi/Dataset-REPAIR/.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes