LG AIMay 31

A Fiber Criterion for Representation Identifiability in Supervised Learning

arXiv:2606.0109215.7

AI Analysis

This work clarifies a fundamental identifiability problem for representation learning in supervised settings, showing that representation properties cannot be inferred from predictive behavior alone, which is important for researchers using supervised learning to study representations.

The paper formalizes representation identifiability in supervised learning, showing that representation properties are identifiable only if they are constant on the fibers of the projection from representation-head pairs to the induced predictor. It demonstrates that predictor-preserving augmentation can alter representation properties without changing predictor behavior, clarifying that representation-level claims require additional assumptions beyond supervised performance.

Supervised learning evaluates predictors through their input-output behavior. When a predictor is implemented as a composition $f=c\circ h$, supervised evidence constrains the composite map $f$ but need not determine the representation-head factorization $(h,c)$. This paper formalizes the resulting representation-level identifiability problem: for a class of admissible representation-head pairs, a representation property is identifiable from the induced predictor exactly when it is constant on the fibers of the projection $(h,c)\mapsto c\circ h$, equivalently when it descends to a well-defined property of the predictor. Predictor-preserving augmentation gives a canonical obstruction: auxiliary information can be appended to a representation while the head ignores it, leaving the predictor unchanged but altering properties such as minimality, compression, invariance, equivariance, nuisance information, or semantic accessibility. This construction separates representation identifiability from optimization and finite-sample estimation. Finite-sample diagnostics illustrate, rather than prove, the criterion: exact algebraic witnesses hold the predictor fixed while changing representation diagnostics, and matched-performance Waterbirds models show that different constraints can select different representations at similar supervised performance. The results clarify that representation-level claims require assumptions, objectives, measurements, or inductive biases beyond supervised predictive behavior alone.

View on arXiv PDF

Similar