LG AI CLFeb 19, 2025

LESA: Learnable LLM Layer Scaling-Up

Yifei Yang, Zouying Cao, Xinbei Ma, Yao Yao, Libo Qin, Zhi Chen, Hai Zhao

arXiv:2502.13794v117.96 citationsh-index: 10Has CodeACL

Originality Highly original

AI Analysis

This addresses the high computational expense of scaling up LLMs, offering a more efficient alternative to heuristic-based methods, though it is incremental in improving existing scaling techniques.

The paper tackles the problem of expensive training for Large Language Models by proposing LESA, a learnable method for depth scaling-up that uses neural networks to predict inserted layer parameters, achieving superior performance with less than half the computational cost during continual pre-training.

Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive. Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones. However, existing depth scaling-up methods rely on empirical heuristic rules for layer duplication, which result in poorer initialization and slower convergence during continual pre-training. We propose \textbf{LESA}, a novel learnable method for depth scaling-up. By concatenating parameters from each layer and applying Singular Value Decomposition, we uncover latent patterns between layers, suggesting that inter-layer parameters can be learned. LESA uses a neural network to predict the parameters inserted between adjacent layers, enabling better initialization and faster training. Experiments show that LESA outperforms existing baselines, achieving superior performance with less than half the computational cost during continual pre-training. Extensive analyses demonstrate its effectiveness across different model sizes and tasks.

View on arXiv PDF Code

Similar