Generalization vs. Specialization under Concept Shift

arXiv:2409.15582v21 citationsh-index: 10
Originality Incremental advance
AI Analysis

This work addresses safety risks in ML adoption by analyzing concept shift, but it is incremental as it builds on existing theoretical frameworks for distribution shift.

The paper tackled the problem of machine learning model brittleness under concept shift, deriving an exact expression for prediction risk in ridge regression and revealing phase transitions and nonmonotonic data dependence in test performance, with experiments showing that too long context length can harm generalization in transformers.

Machine learning models are often brittle under distribution shift, i.e., when data distributions at test time differ from those during training. Understanding this failure mode is central to identifying and mitigating safety risks of mass adoption of machine learning. Here we analyze ridge regression under concept shift -- a form of distribution shift in which the input-label relationship changes at test time. We derive an exact expression for prediction risk in the thermodynamic limit. Our results reveal nontrivial effects of concept shift on generalization performance, including a phase transition between weak and strong concept shift regimes and nonmonotonic data dependence of test performance even when double descent is absent. Our theoretical results are in good agreement with experiments based on transformers pretrained to solve linear regression; under concept shift, too long context length can be detrimental to generalization performance of next token prediction. Finally, our experiments on MNIST and FashionMNIST suggest that this intriguing behavior is present also in classification problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes