LGAIFeb 16, 2025

CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation

arXiv:2502.10940v324 citationsh-index: 7EMNLP
Originality Highly original
AI Analysis

This addresses the problem of resource-intensive LLM pre-training for AI researchers and practitioners, offering an incremental efficiency improvement.

The paper tackles the high computational cost of pre-training large language models (LLMs) by proposing CoLA, a method that replaces full-size layers with compute-efficient auto-encoders to enforce low-rank activations, reducing computing cost by 2× and improving training throughput by 1.86× while maintaining performance.

The full-size MLPs and the projection layers in attention introduce tremendous model sizes of large language models (LLMs), consuming extensive computational resources in pre-training. We empirically observe that the activations of pre-trained LLMs exhibit low-rank property. Motivated by such observations, we propose CoLA and its memory-efficient implementation, CoLA-M, to replace these full-size layers with compute-efficient auto-encoders that naturally enforce low-rank activations throughout training. This fundamental architectural change eliminates the activation redundancy and significantly boosts model capacity and training efficiency. Experiments on LLaMA models with 60 million to 7 billion parameters show that CoLA reduces the computing cost by $\bf 2\pmb{\times}$ and improves training throughput by $\bf 1.86\pmb{\times}$ while maintaining full-rank level performance. CoLA-M further squeezes memory cost without sacrificing throughput, offering a pre-training approach with collectively superior parameter, computing, and memory efficiency. The LLMs produced are also $\bf 2\pmb{\times}$ smaller, enabling faster inference with lower memory cost on resource-constrained platforms.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes