CLMay 20, 2025

Can Pruning Improve Reasoning? Revisiting Long-CoT Compression with Capability in Mind for Better Reasoning

arXiv:2505.14582v216 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses the challenge of distilling complex reasoning into efficient small models, though it appears incremental as it builds on existing pruning and CoT compression methods.

The paper tackles the problem of compressing verbose chain-of-thought reasoning for small language models by proposing Prune-on-Logic, a framework that prunes low-utility reasoning steps, finding that verification pruning improves accuracy while reducing token usage across tasks and model scales.

Long chain-of-thought (Long-CoT) reasoning improves accuracy in LLMs, yet its verbose, self-reflective style often hinders effective distillation into small language models (SLMs). We revisit Long-CoT compression through the lens of capability alignment and ask: Can pruning improve reasoning? We propose Prune-on-Logic, a structure-aware framework that transforms Long-CoT into logic graphs and selectively prunes low-utility reasoning steps under self-verification constraints. Through systematic analysis across three pruning strategies - targeting entire chains, core reasoning, and verification - we find that verification pruning consistently improves accuracy while reducing token usage, whereas reasoning or indiscriminate pruning degrades performance. Our study reveals that effective pruning aligns supervision with model capacity rather than merely shortening inputs. Gains hold across tasks, model scales, and CoT capability, with larger models benefiting more from pruning due to richer but more redundant reasoning. Our empirical findings highlight pruning as a structural optimization strategy for aligning CoT reasoning with SLM capacity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes