CLAILGMar 27, 2025

Boosting Large Language Models with Mask Fine-Tuning

arXiv:2503.22764v12 citationsh-index: 13
Originality Highly original
AI Analysis

This work addresses the fine-tuning efficiency and performance for LLM users, presenting a novel paradigm rather than an incremental improvement.

The authors tackled the problem of improving large language model performance by challenging the assumption that model integrity is necessary during fine-tuning, and introduced Mask Fine-Tuning (MFT), which uses binary masks to break model integrity and achieved gains such as 1.95% and 1.88% average improvements in coding tasks with LLaMA2 models.

The model is usually kept integral in the mainstream large language model (LLM) fine-tuning protocols. No works have questioned whether maintaining the integrity of the model is indispensable for performance. In this work, we introduce Mask Fine-Tuning (MFT), a brand-new LLM fine-tuning paradigm to show that properly breaking the integrity of the model can surprisingly lead to improved performance. Specifically, MFT learns a set of binary masks supervised by the typical LLM fine-tuning objective. Extensive experiments show that MFT gains a consistent performance boost across various domains and backbones (e.g., 1.95%/1.88% average gain in coding with LLaMA2-7B/3.1-8B). Detailed procedures are provided to study the proposed MFT from different hyperparameter perspectives for better insight. In particular, MFT naturally updates the current LLM training protocol by deploying it on a complete well-trained model. This study extends the functionality of mask learning from its conventional network pruning context for model compression to a more general scope.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes