LG AP MLFeb 4, 2022

Discovering Distribution Shifts using Latent Space Representations

Leo Betthauser, Urszula Chajewska, Maurice Diesendruck, Rohith Pesala

arXiv:2202.02339v24.65 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the challenge of model selection and application for practitioners using representation learning, though it is incremental as it builds on existing embedding methods.

The paper tackles the problem of detecting distribution shifts in embedding models to assess generalizability, proposing two non-parametric tests based on embedding space geometry that successfully detect shifts in synthetic and real-world datasets.

Rapid progress in representation learning has led to a proliferation of embedding models, and to associated challenges of model selection and practical application. It is non-trivial to assess a model's generalizability to new, candidate datasets and failure to generalize may lead to poor performance on downstream tasks. Distribution shifts are one cause of reduced generalizability, and are often difficult to detect in practice. In this paper, we use the embedding space geometry to propose a non-parametric framework for detecting distribution shifts, and specify two tests. The first test detects shifts by establishing a robustness boundary, determined by an intelligible performance criterion, for comparing reference and candidate datasets. The second test detects shifts by featurizing and classifying multiple subsamples of two datasets as in-distribution and out-of-distribution. In evaluation, both tests detect model-impacting distribution shifts, in various shift scenarios, for both synthetic and real-world datasets.

View on arXiv PDF Code

Similar