LGOct 27, 2025

Learning to Reason Efficiently with Discounted Reinforcement Learning

Alex Ayoub, Kavosh Asadi, Dale Schuurmans, Csaba Szepesvári, Karim Bouyarmane

arXiv:2510.23486v12 citationsh-index: 13

Originality Incremental advance

AI Analysis

This work addresses computational efficiency for users of large reasoning models, though it is incremental as it builds on existing reinforcement learning methods.

The paper tackles the problem of large reasoning models consuming excessive tokens, which increases computational cost and latency, by using discounted reinforcement learning to penalize reasoning tokens, resulting in shorter chains of thought while preserving accuracy.

Large reasoning models (LRMs) often consume excessive tokens, inflating computational cost and latency. We challenge the assumption that longer responses improve accuracy. By penalizing reasoning tokens using a discounted reinforcement learning setup (interpretable as a small token cost) and analyzing Blackwell optimality in restricted policy classes, we encourage concise yet accurate reasoning. Experiments confirm our theoretical results that this approach shortens chains of thought while preserving accuracy.

View on arXiv PDF

Similar