AI LGJul 1, 2025

Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess

Dongyoon Hwang, Hojoon Lee, Jaegul Choo, Dongmin Park, Jongho Park

arXiv:2507.00726v313.65 citationsh-index: 36Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of enhancing strategic reasoning in LLMs for AI researchers, but it is incremental as it shows limited success and identifies fundamental limitations.

The study investigated whether large language models (LLMs) can develop strategic reasoning through reinforcement learning (RL) in chess, using a distillation-based dense reward method that outperformed sparse rewards but plateaued far below expert levels, indicating a deficit in pretrained models' internal understanding of chess.

While reinforcement learning (RL) for large language models (LLMs) has shown promise in mathematical reasoning, strategic reasoning for LLMs using RL remains largely unexplored. We investigate whether LLMs can develop strategic reasoning capabilities through RL in chess. To this end, we leverage a chess-pretrained action-value network to provide dense reward on the LLM's output move quality, which can be seen as a form of knowledge distillation. Our experiments show that our distillation-based dense rewards often outperform sparse binary rewards. However, surprisingly, all models plateau far below expert levels. We provide SFT and RL ablations on chess reasoning training and find evidence that this limitation stems from a deficit in the pretrained models' internal understanding of chess-a deficit which RL alone may not be able to fully overcome. The code is available at https://github.com/krafton-ai/Chess-R1.

View on arXiv PDF Code

Similar