LGMay 20, 2023

Low-Entropy Latent Variables Hurt Out-of-Distribution Performance

arXiv:2305.12238v1
Originality Incremental advance
AI Analysis

This addresses robustness issues in machine learning models for applications requiring reliable performance under distributional shifts, though it is incremental as it builds on existing contrastive learning and entropy analysis methods.

The paper investigates how the entropy of intermediate representations affects model robustness to distributional shift, finding that removing low-entropy bits improves out-of-distribution performance, while high-entropy bit removal harms both in-distribution and out-of-distribution accuracy.

We study the relationship between the entropy of intermediate representations and a model's robustness to distributional shift. We train models consisting of two feed-forward networks end-to-end separated by a discrete $n$-bit channel on an unsupervised contrastive learning task. Different masking strategies are applied after training that remove a proportion of low-entropy bits, high-entropy bits, or randomly selected bits, and the effects on performance are compared to the baseline accuracy with no mask. We hypothesize that the entropy of a bit serves as a guide to its usefulness out-of-distribution (OOD). Through experiment on three OOD datasets we demonstrate that the removal of low-entropy bits can notably benefit OOD performance. Conversely, we find that top-entropy masking disproportionately harms performance both in-distribution (InD) and OOD.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes