Sharp Spectral Thresholds for Logit Fixed Points

arXiv:2605.1565122.2

Predicted impact top 84% in LG · last 90 daysOriginality Highly original

AI Analysis

For researchers studying entropy-regularized RL, logit dynamics, and mean-field variational updates, this provides a precise stability condition that enlarges the predictable regime.

The paper identifies the sharp spectral threshold for stability in affine softmax feedback systems, proving that stability holds when β‖ΠWΠ‖_{T→T} < 2, which extends the classical conservative bound and fills the previously missing pre-bifurcation regime.

Softmax feedback systems are a common mathematical core of entropy-regularized reinforcement learning, logit game dynamics, population choice, and mean-field variational updates. Their central stability question is simple: when does a self-reinforcing softmax system produce a unique and globally predictable outcome? Classical theory gives a conservative answer. By treating softmax as a unit-scale response, it certifies stability only in a strongly randomized regime. We prove that the classical approach misses an entire stable regime and does not identify the point at which the qualitative change truly occurs. For finite-dimensional affine logit systems, the sharp dimension-free Euclidean threshold is $$β\|ΠWΠ\|_{\mathcal T\to\mathcal T}<2,$$ rather than the previously used condition, which certifies stability only while the softmax system remains safely over-regularized. Our theorem fills the previously missing pre-bifurcation regime, extending stability guarantees for affine softmax feedback systems to reward-responsive yet globally predictable systems. It enlarges the certified stability boundary for these systems and identifies where the model genuinely undergoes a phase transition.

View on arXiv PDF

Similar