CVAIMay 17, 2025

Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks

arXiv:2505.11881v4h-index: 2
Originality Incremental advance
AI Analysis

This work addresses a problem in deep learning for researchers and practitioners by proposing an incremental improvement to residual connections for more stable and efficient network training.

The paper tackles the issue of residual connections underutilizing module capacity by introducing Orthogonal Residual Update, which adds only the orthogonal component of a module's output to the input stream, resulting in improved generalization accuracy and training stability, such as a +3.78 pp top-1 accuracy gain for ViT-B on ImageNet-1k.

Residual connections are pivotal for deep neural networks, enabling greater depth by mitigating vanishing gradients. However, in standard residual updates, the module's output is directly added to the input stream. This can lead to updates that predominantly reinforce or modulate the existing stream direction, potentially underutilizing the module's capacity for learning entirely novel features. In this work, we introduce Orthogonal Residual Update: we decompose the module's output relative to the input stream and add only the component orthogonal to this stream. This design aims to guide modules to contribute primarily new representational directions, fostering richer feature learning while promoting more efficient training. We demonstrate that our orthogonal update strategy improves generalization accuracy and training stability across diverse architectures (ResNetV2, Vision Transformers) and datasets (CIFARs, TinyImageNet, ImageNet-1k), achieving, for instance, a +3.78 pp top-1 accuracy gain for ViT-B on ImageNet-1k.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes