LGAIMar 11

SCORE: Replacing Layer Stacking with Contractive Recurrent Depth

arXiv:2603.10544v115.31 citationsh-index: 35
Predicted impact top 64% in LG · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the challenge of efficient and stable optimization in deep learning for researchers and practitioners, offering a lightweight method that is incremental over existing residual connection techniques.

The paper tackles the problem of deep neural network design by proposing SCORE, a discrete recurrent alternative to layer stacking that uses a contractive update inspired by ODEs, which generally improves convergence speed and reduces parameter count across various architectures like graph neural networks, multilayer perceptrons, and Transformers.

Residual connections are central to modern deep neural networks, enabling stable optimization and efficient information flow across depth. In this work, we propose SCORE (Skip-Connection ODE Recurrent Embedding), a discrete recurrent alternative to classical layer stacking. Instead of composing multiple independent layers, SCORE iteratively applies a single shared neural block using an ODE (Ordinary Differential Equation)-inspired contractive update: ht+1 = (1 - dt) * ht + dt * F(ht) This formulation can be interpreted as a depth-by-iteration refinement process, where the step size dt explicitly controls stability and update magnitude. Unlike continuous Neural ODE approaches, SCORE uses a fixed number of discrete iterations and standard backpropagation without requiring ODE solvers or adjoint methods. We evaluate SCORE across graph neural networks (ESOL molecular solubility), multilayer perceptrons, and Transformer-based language models (nanoGPT). Across architectures, SCORE generally improves convergence speed and often accelerates training. SCORE is reducing parameter count through shared weights. In practice, simple Euler integration provides the best trade-off between computational cost and performance, while higher-order integrators yield marginal gains at increased compute. These results suggest that controlled recurrent depth with contractive residual updates offers a lightweight and effective alternative to classical stacking in deep neural networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes