AI CLJan 8

ConMax: Confidence-Maximizing Compression for Efficient Chain-of-Thought Reasoning

Minda Hu, Zexuan Qiu, Zenan Xu, Kun Li, Bo Zhou, Irwin King

arXiv:2601.04973v16.02 citationsh-index: 6

Originality Highly original

AI Analysis

This work addresses efficiency issues in reasoning models for AI applications, offering a novel method to reduce computational overhead while maintaining performance, though it is incremental in improving existing compression techniques.

The paper tackles the problem of 'overthinking' in Large Reasoning Models, where redundant reasoning paths increase computational costs without improving accuracy, by introducing ConMax, a reinforcement learning framework that compresses reasoning traces to reduce inference length by 43% with only a 0.7% accuracy drop.

Recent breakthroughs in Large Reasoning Models (LRMs) have demonstrated that extensive Chain-of-Thought (CoT) generation is critical for enabling intricate cognitive behaviors, such as self-verification and backtracking, to solve complex tasks. However, this capability often leads to ``overthinking'', where models generate redundant reasoning paths that inflate computational costs without improving accuracy. While Supervised Fine-Tuning (SFT) on reasoning traces is a standard paradigm for the 'cold start' phase, applying existing compression techniques to these traces often compromises logical coherence or incurs prohibitive sampling costs. In this paper, we introduce ConMax (Confidence-Maximizing Compression), a novel reinforcement learning framework designed to automatically compress reasoning traces while preserving essential reasoning patterns. ConMax formulates compression as a reward-driven optimization problem, training a policy to prune redundancy by maximizing a weighted combination of answer confidence for predictive fidelity and thinking confidence for reasoning validity through a frozen auxiliary LRM. Extensive experiments across five reasoning datasets demonstrate that ConMax achieves a superior efficiency-performance trade-off. Specifically, it reduces inference length by 43% over strong baselines at the cost of a mere 0.7% dip in accuracy, proving its effectiveness in generating high-quality, efficient training data for LRMs.

View on arXiv PDF

Similar