ANCRe: Adaptive Neural Connection Reassignment for Efficient Depth Scaling
This work addresses the efficiency of depth scaling in foundation models, offering a lightweight solution to enhance convergence and utilization, though it is incremental as it builds on existing residual connection mechanisms.
The paper tackles the underutilization of deep layers in neural networks by analyzing residual connections and introducing ANCRe, a framework that learns adaptive residual connectivities, resulting in accelerated convergence and improved performance across various models with less than 1% overhead.
Scaling network depth has been a central driver behind the success of modern foundation models, yet recent investigations suggest that deep layers are often underutilized. This paper revisits the default mechanism for deepening neural networks, namely residual connections, from an optimization perspective. Rigorous analysis proves that the layout of residual connections can fundamentally shape convergence behavior, and even induces an exponential gap in convergence rates. Prompted by this insight, we introduce adaptive neural connection reassignment (ANCRe), a principled and lightweight framework that parameterizes and learns residual connectivities from the data. ANCRe adaptively reassigns residual connections with negligible computational and memory overhead ($<1\%$), while enabling more effective utilization of network depth. Extensive numerical tests across pre-training of large language models, diffusion models, and deep ResNets demonstrate consistently accelerated convergence, boosted performance, and enhanced depth efficiency over conventional residual connections.