CLNov 24, 2025

Think Before You Prune: Selective Self-Generated Calibration for Pruning Large Reasoning Models

Yang Xiang, Yixin Ji, Juntao Li, Min Zhang

arXiv:2511.18864v11 citations

Originality Incremental advance

AI Analysis

This work addresses the computational inefficiency of LRMs for users needing faster inference, but it is incremental as it builds on existing pruning techniques by adapting them to a new model type.

The paper tackles the problem of pruning Large Reasoning Models (LRMs) to reduce inference overhead, showing that using self-generated reasoning data for calibration improves pruning performance, with experimental results indicating a 10%-13% boost in reasoning ability compared to general methods.

Large Reasoning Models (LRMs) have demonstrated remarkable performance on complex reasoning benchmarks. However, their long chain-of-thought reasoning processes incur significant inference overhead. Pruning has emerged as a promising approach to reducing computational costs. However, existing efforts have primarily focused on large language models (LLMs), while pruning LRMs remains unexplored. In this work, we conduct the first empirical study on pruning LRMs and show that directly applying existing pruning techniques fails to yield satisfactory results. Our findings indicate that using self-generated reasoning data for calibration can substantially improve pruning performance. We further investigate how the difficulty and length of reasoning data affect pruning outcomes. Our analysis reveals that challenging and moderately long self-generated reasoning data serve as ideal calibration data. Based on these insights, we propose a Selective Self-Generated Reasoning (SSGR) data construction strategy to provide effective calibration data for pruning LRMs. Experimental results on the DeepSeek-R1-Distill model series validate that our strategy improves the reasoning ability of pruned LRMs by 10%-13% compared to general pruning methods.

View on arXiv PDF

Similar