LG AI CVSep 5, 2025

Beyond I-Con: Exploring New Dimension of Distance Measures in Representation Learning

Jasmine Shone, Shaden Alshammari, Mark Hamilton, Zhening Li, William Freeman

arXiv:2509.04734v14.1h-index: 2

Originality Incremental advance

AI Analysis

This addresses optimization challenges in representation learning for ML practitioners, though it appears incremental as it builds directly on the I-Con framework.

The paper tackles the problem that KL divergence in representation learning may be misaligned with objectives and cause optimization challenges, and presents Beyond I-Con, a framework that explores alternative divergences and similarity kernels to discover novel loss functions. Results include achieving state-of-the-art unsupervised clustering on DINO-ViT embeddings with total variation distance, outperforming standard supervised contrastive learning with TV and distance-based kernels, and achieving superior dimensionality reduction with bounded f-divergences.

The Information Contrastive (I-Con) framework revealed that over 23 representation learning methods implicitly minimize KL divergence between data and learned distributions that encode similarities between data points. However, a KL-based loss may be misaligned with the true objective, and properties of KL divergence such as asymmetry and unboundedness may create optimization challenges. We present Beyond I-Con, a framework that enables systematic discovery of novel loss functions by exploring alternative statistical divergences and similarity kernels. Key findings: (1) on unsupervised clustering of DINO-ViT embeddings, we achieve state-of-the-art results by modifying the PMI algorithm to use total variation (TV) distance; (2) on supervised contrastive learning, we outperform the standard approach by using TV and a distance-based similarity kernel instead of KL and an angular kernel; (3) on dimensionality reduction, we achieve superior qualitative results and better performance on downstream tasks than SNE by replacing KL with a bounded f-divergence. Our results highlight the importance of considering divergence and similarity kernel choices in representation learning optimization.

View on arXiv PDF

Similar