LGAIMay 20

TBP-mHC: full expressivity for manifold-constrained hyper connections through transportation polytopes

arXiv:2605.217248.0
Predicted impact top 35% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For deep learning practitioners using residual networks, this provides a more efficient and stable method for learnable mixing across multiple residual streams.

The paper introduces Transportation Birkhoff Polytope (TBP) parameterizations for manifold-constrained hyper-connections, achieving exact double stochasticity with full expressivity and $(n-1)^2$ degrees of freedom without iterative normalization or combinatorial explosion. Experiments on language model pre-training show competitive performance with improved stability and scalability.

Hyper-Connections (HC) improve residual networks by introducing learnable mixing across multiple residual streams, but unconstrained mixing leads to training instability. Manifold-Constrained Hyper-Connections (mHC) address this by enforcing approximate double stochasticity via Sinkhorn normalization, while mHC-lite ensures exact constraints through convex combinations of permutation matrices at the cost of factorial complexity. KromHC reduces this cost using Kronecker-product parameterizations, but restricts the mixing matrices to a structured submanifold of the Birkhoff polytope . We propose Transportation Birkhoff Polytope (TBP) parameterizations and their Recursive variants (RTBP), which construct exactly doubly stochastic mixing matrices with $(n-1)^2$ degrees of freedom. Our approach avoids iterative normalization and combinatorial explosion while preserving full expressivity of the Birkhoff polytope. Empirical results on language model pre-training' demonstrate competitive performance with improved stability and scalability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes