LG AIJul 22, 2024

Planning in a recurrent neural network that plays Sokoban

Mohammad Taufeeque, Philip Quirke, Maximilian Li, Chris Cundy, Aaron David Tucker, Adam Gleave, Adrià Garriga-Alonso

arXiv:2407.15421v316.413 citationsh-index: 16Has Code

Originality Incremental advance

AI Analysis

This work provides insights into learned planning in neural networks, with potential applications in AI for sequential decision-making, though it is incremental as it builds on prior work in a specific domain.

The researchers analyzed a recurrent neural network trained on Sokoban puzzles to understand planning mechanisms, finding that it represents causal plans predicting actions up to 50 steps ahead and exhibits pacing behavior for extra computation at level starts. They extended the model to larger, out-of-distribution puzzles, showing robust representations beyond training.

Planning is essential for solving complex tasks, yet the internal mechanisms underlying planning in neural networks remain poorly understood. Building on prior work, we analyze a recurrent neural network (RNN) trained on Sokoban, a challenging puzzle requiring sequential, irreversible decisions. We find that the RNN has a causal plan representation which predicts its future actions about 50 steps in advance. The quality and length of the represented plan increases over the first few steps. We uncover a surprising behavior: the RNN "paces" in cycles to give itself extra computation at the start of a level, and show that this behavior is incentivized by training. Leveraging these insights, we extend the trained RNN to significantly larger, out-of-distribution Sokoban puzzles, demonstrating robust representations beyond the training regime. We open-source our model and code, and believe the neural network's interesting behavior makes it an excellent model organism to deepen our understanding of learned planning.

View on arXiv PDF Code

Similar