CLLGOct 16, 2021

Invariant Language Modeling

arXiv:2110.08413v2294 citations
Originality Incremental advance
AI Analysis

This addresses spurious correlations and biases in language models, which is a critical issue for NLP applications, but it is incremental as it builds on existing IRM methods.

The paper tackles the problem of spurious correlations and poor generalization in pretrained language models by proposing invariant language modeling, a framework that adapts invariant risk minimization to learn invariant representations, resulting in better out-of-domain generalization and removal of structured noise with negligible computational overhead.

Large pretrained language models are critical components of modern NLP pipelines. Yet, they suffer from spurious correlations, poor out-of-domain generalization, and biases. Inspired by recent progress in causal machine learning, in particular the invariant risk minimization (IRM) paradigm, we propose invariant language modeling, a framework for learning invariant representations that generalize better across multiple environments. In particular, we adapt a game-theoretic formulation of IRM (IRM-games) to language models, where the invariance emerges from a specific training schedule in which all the environments compete to optimize their own environment-specific loss by updating subsets of the model in a round-robin fashion. We focus on controlled experiments to precisely demonstrate the ability of our method to (i) remove structured noise, (ii) ignore specific spurious correlations without affecting global performance, and (iii) achieve better out-of-domain generalization. These benefits come with a negligible computational overhead compared to standard training, do not require changing the local loss, and can be applied to any language model. We believe this framework is promising to help mitigate spurious correlations and biases in language models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes