LAuReL: Learned Augmented Residual Layer
This work addresses the need for more efficient and effective architectural components in deep learning models, offering a novel replacement for residual connections that enhances both vision and language models with minimal computational cost, though it appears incremental as it builds on existing residual connection paradigms.
The paper tackles the problem of improving deep learning architectures by introducing LAuReL, a learned augmented residual layer that generalizes residual connections, achieving performance gains such as 60% of the benefits of an extra layer in ResNet-50 with minimal parameter increases (e.g., 0.003% more parameters) and boosting LLM performance by up to 20.05% on downstream tasks with small parameter overheads.
One of the core pillars of efficient deep learning methods is architectural improvements such as the residual/skip connection, which has led to significantly better model convergence and quality. Since then the residual connection has become ubiquitous in not just convolutional neural networks but also transformer-based architectures, the backbone of LLMs. In this paper we introduce Learned Augmented Residual Layer (LAuReL) -- a novel generalization of the canonical residual connection -- with the goal to be an in-situ replacement of the latter while outperforming on both model quality and footprint metrics. Our experiments show that using LAuReL can help boost performance for both vision and language models. For example, on the ResNet-50, ImageNet 1K task, it achieves 60% of the gains from adding an extra layer, while only adding 0.003% more parameters, and matches it while adding 2.6 times fewer parameters. Similarly, when pre-training 1B and 4B parameter LLMs, LAuReL improves performance on a variety of challenging downstream evaluation tasks by 2.54% to 20.05%, while adding only 0.012% and 0.1% additional parameters, respectively.