LG CY DS MLJun 19, 2023

Correcting Underrepresentation and Intersectional Bias for Classification

arXiv:2306.11112v43.81 citationsh-index: 3

Originality Incremental advance

AI Analysis

This addresses fairness and bias issues in machine learning for underrepresented groups, particularly in intersectional settings, though it is incremental as it builds on existing bias correction methods.

The paper tackles the problem of learning from data with underrepresentation bias, where positive examples are filtered at unknown rates across sensitive groups, by developing an algorithm that estimates group-wise drop-out rates using a small unbiased dataset and constructs a reweighting scheme to approximate the true distribution loss, achieving efficient learning for model classes of finite VC dimension.

We consider the problem of learning from data corrupted by underrepresentation bias, where positive examples are filtered from the data at different, unknown rates for a fixed number of sensitive groups. We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out rates, even in settings where intersectional group membership makes learning each intersectional rate computationally infeasible. Using these estimates, we construct a reweighting scheme that allows us to approximate the loss of any hypothesis on the true distribution, even if we only observe the empirical error on a biased sample. From this, we present an algorithm encapsulating this learning and reweighting process along with a thorough empirical investigation. Finally, we define a bespoke notion of PAC learnability for the underrepresentation and intersectional bias setting and show that our algorithm permits efficient learning for model classes of finite VC dimension.

View on arXiv PDF

Similar