LGAICVSep 5, 2025

Beyond I-Con: Exploring New Dimension of Distance Measures in Representation Learning

arXiv:2509.04734v1h-index: 2
Originality Incremental advance
AI Analysis

This addresses optimization challenges in representation learning for ML practitioners, though it appears incremental as it builds directly on the I-Con framework.

The paper tackles the problem that KL divergence in representation learning may be misaligned with objectives and cause optimization challenges, and presents Beyond I-Con, a framework that explores alternative divergences and similarity kernels to discover novel loss functions. Results include achieving state-of-the-art unsupervised clustering on DINO-ViT embeddings with total variation distance, outperforming standard supervised contrastive learning with TV and distance-based kernels, and achieving superior dimensionality reduction with bounded f-divergences.

The Information Contrastive (I-Con) framework revealed that over 23 representation learning methods implicitly minimize KL divergence between data and learned distributions that encode similarities between data points. However, a KL-based loss may be misaligned with the true objective, and properties of KL divergence such as asymmetry and unboundedness may create optimization challenges. We present Beyond I-Con, a framework that enables systematic discovery of novel loss functions by exploring alternative statistical divergences and similarity kernels. Key findings: (1) on unsupervised clustering of DINO-ViT embeddings, we achieve state-of-the-art results by modifying the PMI algorithm to use total variation (TV) distance; (2) on supervised contrastive learning, we outperform the standard approach by using TV and a distance-based similarity kernel instead of KL and an angular kernel; (3) on dimensionality reduction, we achieve superior qualitative results and better performance on downstream tasks than SNE by replacing KL with a bounded f-divergence. Our results highlight the importance of considering divergence and similarity kernel choices in representation learning optimization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes