LGAIJan 30

Avoiding Premature Collapse: Adaptive Annealing for Entropy-Regularized Structural Inference

arXiv:2601.23039v3h-index: 2
Originality Incremental advance
AI Analysis

This addresses a critical stability issue in differentiable matching layers for structural prediction, which is incremental but important for scaling architectures.

The paper tackled the instability of recovering discrete permutations in entropy-regularized Optimal Transport during structural inference, proposing an adaptive scheduling algorithm that stabilizes training and prevents gradient explosions on the FineWeb-Edu dataset.

Differentiable matching layers and residual connection paradigms, often implemented via entropy-regularized Optimal Transport (OT), serve as critical mechanisms in structural prediction and architectural scaling. However, recovering discrete permutations or maintaining identity mappings via annealing $ε\to 0$ is notoriously unstable. In this work, we identify a fundamental mechanism for this failure: \textbf{Premature Mode Collapse}. By analyzing the non-normal dynamics of the Sinkhorn fixed-point map, we reveal a theoretical thermodynamic speed limit: standard exponential cooling outpaces the contraction rate of the inference operator, which degrades as $O(1/ε)$. To address this, we propose \textbf{Efficient Piecewise Hybrid Adaptive Stability Control (EPH-ASC)}, an adaptive scheduling algorithm that monitors the stability of the inference process. We demonstrate that EPH-ASC is essential for stabilizing Manifold-Constrained Hyper-Connections (mHC) during large-scale training on the FineWeb-Edu dataset, effectively preventing late-stage gradient explosions by enforcing a linear stability law.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes