Diversity Is All You Need for Contrastive Learning: Spectral Bounds on Gradient Magnitudes
This work addresses efficiency bottlenecks in contrastive learning for computer vision, offering incremental improvements in training speed and stability.
The paper tackled the problem of improving contrastive learning efficiency by analyzing gradient norms and designing spectrum-aware batch selection, resulting in a 15% reduction in training time to reach 67.5% top-1 accuracy on ImageNet-100 and a 1.37x reduction in gradient variance with in-batch whitening.
We derive non-asymptotic spectral bands that bound the squared InfoNCE gradient norm via alignment, temperature, and batch spectrum, recovering the \(1/τ^{2}\) law and closely tracking batch-mean gradients on synthetic data and ImageNet. Using effective rank \(R_{\mathrm{eff}}\) as an anisotropy proxy, we design spectrum-aware batch selection, including a fast greedy builder. On ImageNet-100, Greedy-64 cuts time-to-67.5\% top-1 by 15\% vs.\ random (24\% vs.\ Pool--P3) at equal accuracy; CIFAR-10 shows similar gains. In-batch whitening promotes isotropy and reduces 50-step gradient variance by \(1.37\times\), matching our theoretical upper bound.