CVAIFeb 3

Global Geometry Is Not Enough for Vision Representations

arXiv:2602.03282v11 citations
Originality Incremental advance
AI Analysis

This work identifies a critical limitation in current representation learning for vision, highlighting that global geometry alone is insufficient for modeling composite structure, which is incremental but important for improving evaluation and training methods.

The paper challenges the assumption that global embedding geometry is sufficient for robust vision representations, showing that standard geometric metrics have near-zero correlation with compositional binding across 21 encoders, while functional sensitivity measured by the input-output Jacobian reliably tracks this capability.

A common assumption in representation learning is that globally well-distributed embeddings support robust and generalizable representations. This focus has shaped both training objectives and evaluation protocols, implicitly treating global geometry as a proxy for representational competence. While global geometry effectively encodes which elements are present, it is often insensitive to how they are composed. We investigate this limitation by testing the ability of geometric metrics to predict compositional binding across 21 vision encoders. We find that standard geometry-based statistics exhibit near-zero correlation with compositional binding. In contrast, functional sensitivity, as measured by the input-output Jacobian, reliably tracks this capability. We further provide an analytic account showing that this disparity arises from objective design, as existing losses explicitly constrain embedding geometry but leave the local input-output mapping unconstrained. These results suggest that global embedding geometry captures only a partial view of representational competence and establish functional sensitivity as a critical complementary axis for modeling composite structure.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes