LGAIFeb 10

Long Chain-of-Thought Compression via Fine-Grained Group Policy Optimization

arXiv:2602.10048v11 citationsh-index: 12
Originality Incremental advance
AI Analysis

This addresses computational efficiency issues for users of LLMs in reasoning tasks, but it is incremental as it builds on Group Relative Policy Optimization.

The paper tackles the problem of unnecessarily verbose Chain-of-Thought reasoning in Large Language Models, which increases computational costs, by proposing Fine-grained Group policy Optimization (FGO) to compress CoT without performance degradation.

Large Language Models (LLMs) often generate unnecessarily verbose Chain-of-Thought (CoT) reasoning that increases computational costs and latency without proportional performance gains. In this paper, we propose \textbf{F}ine-grained \textbf{G}roup policy \textbf{O}ptimization (\textbf{FGO}), a Reinforcement Learning (RL) algorithm that refines group responses by subdividing them and assigning appropriate weights based on length and entropy, thereby enabling effective CoT compression. Meanwhile, as an enhanced variant of Group Relative Policy Optimization (GRPO), FGO successfully addresses two major limitations of the GRPO: inefficient data utilization and entropy collapse. We evaluate FGO on multiple reasoning LLMs and benchmarks, including MATH500, AIME24, AMC23, and Minerva. Experimental results show that FGO achieves efficient CoT compression without degrading performance, and simultaneously resolves the key limitations of GRPO.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes