LGCVJul 6, 2022

Predicting is not Understanding: Recognizing and Addressing Underspecification in Machine Learning

arXiv:2207.02598v135 citationsh-index: 31
Originality Incremental advance
AI Analysis

This addresses the reliability issue for ML practitioners by showing that in-domain performance alone is insufficient for OOD model selection, though it is incremental in formalizing and mitigating underspecification.

The paper tackles the problem of underspecification in machine learning, where models with identical in-domain accuracy can differ in out-of-distribution (OOD) performance, and proposes a method to identify and address this by training multiple models with an independence constraint to discover predictive features, resulting in a global model with superior OOD performance demonstrated on datasets like WILDS-Camelyon17 and GQA.

Machine learning (ML) models are typically optimized for their accuracy on a given dataset. However, this predictive criterion rarely captures all desirable properties of a model, in particular how well it matches a domain expert's understanding of a task. Underspecification refers to the existence of multiple models that are indistinguishable in their in-domain accuracy, even though they differ in other desirable properties such as out-of-distribution (OOD) performance. Identifying these situations is critical for assessing the reliability of ML models. We formalize the concept of underspecification and propose a method to identify and partially address it. We train multiple models with an independence constraint that forces them to implement different functions. They discover predictive features that are otherwise ignored by standard empirical risk minimization (ERM), which we then distill into a global model with superior OOD performance. Importantly, we constrain the models to align with the data manifold to ensure that they discover meaningful features. We demonstrate the method on multiple datasets in computer vision (collages, WILDS-Camelyon17, GQA) and discuss general implications of underspecification. Most notably, in-domain performance cannot serve for OOD model selection without additional assumptions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes