CLJun 25, 2025

GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching

arXiv:2506.20480v12 citationsh-index: 11Has Code
Originality Incremental advance
AI Analysis

This addresses the deployment challenges of LLMs for users needing efficient inference, though it is incremental as it builds on existing pruning methods.

The paper tackles the problem of compressing large language models (LLMs) to reduce deployment costs by developing a strategy that combines layers from finetuned variants, achieving a compressed model that maintains 97.3% of original performance while removing 25% of parameters for Llama2-13B.

Large language models (LLMs) have shown remarkable capabilities in language understanding and generation. However, such impressive capability typically comes with a substantial model size, which presents significant challenges in deployment and inference. While structured pruning of model parameters offers a promising way to reduce computational costs at deployment time, current methods primarily focus on single model pruning. In this work, we develop a novel strategy to compress models by strategically combining or merging layers from finetuned model variants, which preserves the original model's abilities by aggregating capabilities accentuated in different finetunes. We pose the optimal tailoring of these LLMs as a zero-order optimization problem, adopting a search space that supports three different operations: (1) Layer removal, (2) Layer selection from different candidate models, and (3) Layer merging. Our experiments demonstrate that this approach leads to competitive model pruning, for example, for the Llama2-13B model families, our compressed models maintain approximately 97.3\% of the original performance while removing $\sim25\%$ of parameters, significantly outperforming previous state-of-the-art methods. The code is available at https://github.com/Guinan-Su/auto-merge-llm.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes