LGAIJul 25, 2025

Handling Out-of-Distribution Data: A Survey

arXiv:2507.21160v116 citationsh-index: 18IEEE Trans Knowl Data Eng
Originality Synthesis-oriented
AI Analysis

It addresses the challenge of handling distribution shifts for researchers and practitioners in machine learning, but it is incremental as it builds on existing literature without introducing new methods.

This survey paper tackles the problem of distribution shift in machine learning, where data distributions differ between training and deployment, by reviewing methods for detecting, measuring, and mitigating its effects, with a focus on out-of-distribution data that previous surveys have overlooked.

In the field of Machine Learning (ML) and data-driven applications, one of the significant challenge is the change in data distribution between the training and deployment stages, commonly known as distribution shift. This paper outlines different mechanisms for handling two main types of distribution shifts: (i) Covariate shift: where the value of features or covariates change between train and test data, and (ii) Concept/Semantic-shift: where model experiences shift in the concept learned during training due to emergence of novel classes in the test phase. We sum up our contributions in three folds. First, we formalize distribution shifts, recite on how the conventional method fails to handle them adequately and urge for a model that can simultaneously perform better in all types of distribution shifts. Second, we discuss why handling distribution shifts is important and provide an extensive review of the methods and techniques that have been developed to detect, measure, and mitigate the effects of these shifts. Third, we discuss the current state of distribution shift handling mechanisms and propose future research directions in this area. Overall, we provide a retrospective synopsis of the literature in the distribution shift, focusing on OOD data that had been overlooked in the existing surveys.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes