CL CE NAJun 3, 2025

TL;DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression

Zhong-Zhi Li, Xiao Liang, Zihao Tang, Lei Ji, Peijie Wang, Haotian Xu, Xing W, Haizhen Huang, Weiwei Deng, Yeyun Gong, Zhijiang Guo, Xiao Liu

arXiv:2506.02678v317.67 citationsh-index: 26Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of efficient LLM reasoning for researchers and practitioners, but it appears incremental as it builds on existing CoT and re-weighting techniques.

The paper tackles the problem of inefficient language reasoning in LLMs during inference with long outputs by proposing a dynamic ratio-based training pipeline that balances weights between System-1 and System-2 data, resulting in a nearly 40% reduction in output tokens while maintaining reasoning accuracy.

Large Language Models (LLMs) have recently achieved remarkable progress by leveraging Reinforcement Learning and extended Chain-of-Thought (CoT) techniques. However, the challenge of performing efficient language reasoning--especially during inference with extremely long outputs--has drawn increasing attention from the research community. In this work, we propose a dynamic ratio-based training pipeline that does not rely on sophisticated data annotations or interpolation between multiple models. We continuously balance the weights between the model's System-1 and System-2 data to eliminate redundant reasoning processes while preserving the model's reasoning capability. We validate our approach across models on DeepSeek-R1-Distill-7B and DeepSeek-R1-Distill-14B and on a diverse set of benchmarks with varying difficulty levels. Our method significantly reduces the number of output tokens by nearly 40% while maintaining the accuracy of the reasoning. Our code and data will be available soon.

View on arXiv PDF Code

Similar