AIJan 7

EntroCoT: Enhancing Chain-of-Thought via Adaptive Entropy-Guided Segmentation

Zihang Li, Yuhang Wang, Yikun Zong, Wenhan Yu, Xiaokun Yuan, Runhan Jiang, Zirui Liu, Tong Yang, Arthur Jiang

arXiv:2601.03769v26.01 citationsh-index: 5

Originality Incremental advance

AI Analysis

This addresses a specific issue in enhancing mathematical reasoning for large language models, representing an incremental improvement in dataset quality.

The paper tackles the problem of low-quality Chain-of-Thought supervision in fine-tuning datasets, where correct answers are derived from flawed reasoning steps, by proposing EntroCoT to automatically identify and refine these traces, resulting in improved performance on mathematical benchmarks compared to full-dataset baselines.

Chain-of-Thought (CoT) prompting has significantly enhanced the mathematical reasoning capabilities of Large Language Models. We find existing fine-tuning datasets frequently suffer from the "answer right but reasoning wrong" probelm, where correct final answers are derived from hallucinated, redundant, or logically invalid intermediate steps. This paper proposes EntroCoT, a unified framework for automatically identifying and refining low-quality CoT supervision traces. EntroCoT first proposes an entropy-based mechanism to segment the reasoning trace into multiple steps at uncertain junctures, and then introduces a Monte Carlo rollout-based mechanism to evaluate the marginal contribution of each step. By accurately filtering deceptive reasoning samples, EntroCoT constructs a high-quality dataset where every intermediate step in each reasoning trace facilitates the final answer. Extensive experiments on mathematical benchmarks demonstrate that fine-tuning on the subset constructed by EntroCoT consistently outperforms the baseslines of full-dataset supervision.

View on arXiv PDF

Similar