LG CL MLMar 5, 2025

LEWIS (LayEr WIse Sparsity) -- A Training Free Guided Model Merging Approach

Hetarth Chopra, Vidhi Rambhia, Vikram Adve

arXiv:2503.03874v22 citationsh-index: 46

Originality Incremental advance

AI Analysis

This work addresses the challenge of creating multi-task LLMs with enhanced task-specific performance for AI practitioners, representing an incremental improvement over existing model merging techniques.

The paper tackles the problem of improving task-specific performance in model merging for large language models by proposing LEWIS, a guided merging framework that uses activation-based layer importance and calibration data to adjust sparsity, resulting in performance improvements of up to 4% for code instruction-following and 11.3% for math-solving models compared to unguided methods.

As specialized large language models (LLMs) become increasingly prevalent, model merging methods are being used to combine them to create a single multi-task model without requiring any additional data or training. However, these approaches fall short when the objective of merging is to increase the downstream model's performance on a particular task-specific benchmark. In this work, we propose LEWIS (Layer Wise Sparsity), a guided model-merging framework that uses activation-based layer importance to dynamically adjust layer-wise task-vector sparsity required for the merge process. LEWIS uses a calibration dataset to prioritize critical layers during the task-vector pruning process required for model merging. This approach guides existing merging methods by preserving essential layer-wise task-specific knowledge while ensuring the merged model performs the best at benchmarks resembling the calibration dataset. Our experiments demonstrate the effectiveness of LEWIS with performance improvements of code instruction-following and math-solving models created through model merging up to 4 percent and 11.3 percent, respectively, outperforming unguided data-less model merging approaches that use uniform-sparsity.

View on arXiv PDF

Similar