LGJun 19, 2025

SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity

Samir Khaki, Xiuyu Li, Junxian Guo, Ligeng Zhu, Chenfeng Xu, Konstantinos N. Plataniotis, Amir Yazdanbakhsh, Kurt Keutzer, Song Han, Zhijian Liu

arXiv:2506.16500v18 citationsh-index: 27ICML

Originality Highly original

AI Analysis

This addresses the problem of slow and expensive LLM fine-tuning for researchers and practitioners, offering a novel acceleration method that is not purely incremental.

The paper tackles the high computational cost of fine-tuning large language models (LLMs) by introducing SparseLoRA, a method that uses contextual sparsity to accelerate fine-tuning, achieving up to 2.2 times reduction in computational cost and 1.6 times speedup while maintaining accuracy across tasks like commonsense reasoning and code generation.

Fine-tuning LLMs is both computationally and memory-intensive. While parameter-efficient fine-tuning methods, such as QLoRA and DoRA, reduce the number of trainable parameters and lower memory usage, they do not decrease computational cost. In some cases, they may even slow down fine-tuning. In this paper, we introduce SparseLoRA, a method that accelerates LLM fine-tuning through contextual sparsity. We propose a lightweight, training-free SVD sparsity estimator that dynamically selects a sparse subset of weights for loss and gradient computation. Also, we systematically analyze and address sensitivity across layers, tokens, and training steps. Our experimental results show that SparseLoRA reduces computational cost by up to 2.2 times and a measured speedup of up to 1.6 times while maintaining accuracy across various downstream tasks, including commonsense and arithmetic reasoning, code generation, and instruction following.

View on arXiv PDF

Similar