CLAILGNov 29, 2024

Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability

arXiv:2411.19943v373 citationsh-index: 11Has CodeICML
Originality Highly original
AI Analysis

This addresses the problem of precise logical deduction in AI systems for mathematical reasoning tasks, representing an incremental advance through a novel token-level framework.

The paper tackles the challenge of improving large language models' mathematical reasoning by identifying critical tokens that influence incorrect outcomes, and shows that replacing these tokens significantly enhances accuracy on benchmarks like GSM8K and MATH500 with models such as Llama-3 and Deepseek-math.

Mathematical reasoning tasks pose significant challenges for large language models (LLMs) because they require precise logical deduction and sequence analysis. In this work, we introduce the concept of critical tokens -- elements within reasoning trajectories that significantly influence incorrect outcomes. We present a novel framework for identifying these tokens through rollout sampling and demonstrate their substantial divergence from traditional error tokens. Through extensive experiments on datasets such as GSM8K and MATH500, we show that identifying and replacing critical tokens significantly improves model accuracy. We propose an efficient methodology for pinpointing these tokens in large-scale datasets using contrastive estimation and extend this framework to enhance model training processes with direct preference optimization (DPO). Experimental results on GSM8K and MATH500 benchmarks with the widely used models Llama-3 (8B and 70B) and Deepseek-math (7B) demonstrate the effectiveness of the proposed approach, cDPO. Our results underscore the potential of leveraging critical tokens to reduce errors in reasoning tasks, advancing the development of AI systems capable of robust logical deduction. Our code, annotated datasets, and trained models are available at https://github.com/chenzhiling9954/Critical-Tokens-Matter to support and encourage future research in this promising field.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes