CLOct 12, 2020

Measuring and Reducing Gendered Correlations in Pre-trained Models

arXiv:2010.06032v2310 citations
Originality Incremental advance
AI Analysis

This addresses unintended biases in AI models, which is crucial for fairness in applications, though it is incremental as it builds on existing concerns about model artifacts.

The paper tackled the problem of gendered correlations in pre-trained models, revealing that models with similar accuracy can encode these correlations at varying rates and showing how general techniques can reduce them with trade-offs.

Pre-trained models have revolutionized natural language understanding. However, researchers have found they can encode artifacts undesired in many applications, such as professions correlating with one gender more than another. We explore such gendered correlations as a case study for how to address unintended correlations in pre-trained models. We define metrics and reveal that it is possible for models with similar accuracy to encode correlations at very different rates. We show how measured correlations can be reduced with general-purpose techniques, and highlight the trade offs different strategies have. With these results, we make recommendations for training robust models: (1) carefully evaluate unintended correlations, (2) be mindful of seemingly innocuous configuration differences, and (3) focus on general mitigations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes