MLAIITLGJul 2, 2021

Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization

arXiv:2107.01131v325 citations
Originality Highly original
AI Analysis

This work addresses a foundational problem in machine learning for researchers and practitioners by providing a more efficient and theoretically sound mutual information estimator, though it is incremental as it builds on existing variational bounds.

The paper tackles the limitations of existing contrastive mutual information estimators, such as InfoNCE, which require large-batch training and sacrifice tightness for stability, by introducing a new estimator called FLO that is theoretically tight and converges under stochastic gradient descent, achieving more efficient learning as verified on benchmarks.

Successful applications of InfoNCE and its variants have popularized the use of contrastive variational mutual information (MI) estimators in machine learning. While featuring superior stability, these estimators crucially depend on costly large-batch training, and they sacrifice bound tightness for variance reduction. To overcome these limitations, we revisit the mathematics of popular variational MI bounds from the lens of unnormalized statistical modeling and convex optimization. Our investigation not only yields a new unified theoretical framework encompassing popular variational MI bounds but also leads to a novel, simple, and powerful contrastive MI estimator named as FLO. Theoretically, we show that the FLO estimator is tight, and it provably converges under stochastic gradient descent. Empirically, our FLO estimator overcomes the limitations of its predecessors and learns more efficiently. The utility of FLO is verified using an extensive set of benchmarks, which also reveals the trade-offs in practical MI estimation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes