LGAIJun 11, 2025

Interpreting learned search: finding a transition model and value function in an RNN that plays Sokoban

arXiv:2506.10138v12 citationsh-index: 16
Originality Synthesis-oriented
AI Analysis

This work provides interpretability insights into how RL agents leverage compute for planning, though it is incremental as it analyzes a specific trained model without proposing new methods.

The authors reverse-engineered a convolutional RNN trained with model-free RL to play Sokoban, revealing that it learns mechanisms analogous to bidirectional search, including a value function and transition model, which enable it to solve more levels with increased test-time compute.

We partially reverse-engineer a convolutional recurrent neural network (RNN) trained to play the puzzle game Sokoban with model-free reinforcement learning. Prior work found that this network solves more levels with more test-time compute. Our analysis reveals several mechanisms analogous to components of classic bidirectional search. For each square, the RNN represents its plan in the activations of channels associated with specific directions. These state-action activations are analogous to a value function - their magnitudes determine when to backtrack and which plan branch survives pruning. Specialized kernels extend these activations (containing plan and value) forward and backward to create paths, forming a transition model. The algorithm is also unlike classical search in some ways. State representation is not unified; instead, the network considers each box separately. Each layer has its own plan representation and value function, increasing search depth. Far from being inscrutable, the mechanisms leveraging test-time compute learned in this network by model-free training can be understood in familiar terms.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes