LGITMLNov 16, 2020

Combating the Instability of Mutual Information-based Losses via Regularization

arXiv:2011.07932v46 citations
AI Analysis

This addresses a practical problem for researchers and practitioners using mutual information-based methods in fields like supervised and contrastive learning, though it is incremental as it builds on existing losses.

The paper tackled the instability of mutual information-based losses in machine learning by identifying symptoms like non-convergence and divergence, and mitigated these issues by adding a novel regularization term, showing that it stabilizes training both theoretically and experimentally.

Notable progress has been made in numerous fields of machine learning based on neural network-driven mutual information (MI) bounds. However, utilizing the conventional MI-based losses is often challenging due to their practical and mathematical limitations. In this work, we first identify the symptoms behind their instability: (1) the neural network not converging even after the loss seemed to converge, and (2) saturating neural network outputs causing the loss to diverge. We mitigate both issues by adding a novel regularization term to the existing losses. We theoretically and experimentally demonstrate that added regularization stabilizes training. Finally, we present a novel benchmark that evaluates MI-based losses on both the MI estimation power and its capability on the downstream tasks, closely following the pre-existing supervised and contrastive learning settings. We evaluate six different MI-based losses and their regularized counterparts on multiple benchmarks to show that our approach is simple yet effective.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes