CLApr 5, 2025

STEP: Staged Parameter-Efficient Pre-training for Large Language Models

arXiv:2504.04151v117 citationsh-index: 3ACL
Originality Incremental advance
AI Analysis

This addresses memory efficiency for researchers and practitioners training large language models, though it appears incremental as it builds on existing parameter-efficient tuning techniques.

The paper tackles the memory challenges of pre-training large language models by introducing STEP, a method that integrates parameter-efficient tuning with model growth, achieving up to a 53.9% reduction in maximum memory requirements while maintaining equivalent performance.

Pre-training large language models (LLMs) faces significant memory challenges due to the large size of model parameters. We introduce STaged parameter-Efficient Pre-training (STEP), which integrates parameter-efficient tuning techniques with model growth. We conduct experiments on pre-training LLMs of various sizes and demonstrate that STEP achieves up to a 53.9% reduction in maximum memory requirements compared to vanilla pre-training while maintaining equivalent performance. Furthermore, we show that the model by STEP performs comparably to vanilla pre-trained models on downstream tasks after instruction tuning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes