CLCENAJun 3, 2025

TL;DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression

arXiv:2506.02678v37 citationsh-index: 26Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient LLM reasoning for researchers and practitioners, but it appears incremental as it builds on existing CoT and re-weighting techniques.

The paper tackles the problem of inefficient language reasoning in LLMs during inference with long outputs by proposing a dynamic ratio-based training pipeline that balances weights between System-1 and System-2 data, resulting in a nearly 40% reduction in output tokens while maintaining reasoning accuracy.

Large Language Models (LLMs) have recently achieved remarkable progress by leveraging Reinforcement Learning and extended Chain-of-Thought (CoT) techniques. However, the challenge of performing efficient language reasoning--especially during inference with extremely long outputs--has drawn increasing attention from the research community. In this work, we propose a dynamic ratio-based training pipeline that does not rely on sophisticated data annotations or interpolation between multiple models. We continuously balance the weights between the model's System-1 and System-2 data to eliminate redundant reasoning processes while preserving the model's reasoning capability. We validate our approach across models on DeepSeek-R1-Distill-7B and DeepSeek-R1-Distill-14B and on a diverse set of benchmarks with varying difficulty levels. Our method significantly reduces the number of output tokens by nearly 40% while maintaining the accuracy of the reasoning. Our code and data will be available soon.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes