CVCLJun 9, 2025

Play to Generalize: Learning to Reason Through Game Play

arXiv:2506.08011v412 citationsh-index: 18
Originality Highly original
AI Analysis

This addresses the problem of improving reasoning in multimodal AI models for applications like education and problem-solving, offering a novel training approach that is incremental but shows strong gains.

The paper tackles the challenge of developing reasoning capabilities in multimodal large language models by proposing Visual Game Learning (ViGaL), a post-training method where models play arcade-like games like Snake via reinforcement learning, which significantly enhances performance on multimodal math, multi-discipline, and 3D spatial reasoning benchmarks without exposure to domain-specific data, outperforming specialist models while preserving general visual capabilities.

Developing reasoning capabilities in multimodal large language models (MLLMs) remains challenging. Motivated by literature suggesting that gameplay promotes transferable reasoning skills, we propose a novel post-training method, Visual Game Learning (ViGaL), where MLLMs develop generalizable reasoning skills through playing arcade-like games. Specifically, we show that training a 7B-parameter MLLM via reinforcement learning (RL) on simple games like Snake significantly enhances the downstream performance on multimodal math benchmarks like MathVista, on multi-discipline questions like MMMU and on 3D spatial reasoning benchmarks like VSI-Bench, without seeing any worked solutions, equations, or diagrams during RL. Remarkably, our model outperforms specialist models post-trained on benchmark-oriented multimodal reasoning data, while preserving the model's performance on general visual benchmarks, a challenge where specialist models often fall short. Our findings suggest that multimodal reasoning can emerge from gameplay, pointing to a promising strategy of designing surrogate tasks for RL post-training.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes