AIFeb 26

On Sample-Efficient Generalized Planning via Learned Transition Models

arXiv:2602.23148v1h-index: 9
Originality Incremental advance
AI Analysis

This work addresses the problem of sample-efficient generalized planning for AI systems, offering a more robust approach than current Transformer-based methods.

This paper tackles generalized planning by formulating it as a transition-model learning problem, where a neural model approximates the successor-state function and generates plans by rolling out symbolic state trajectories. The authors demonstrate that this approach achieves higher out-of-distribution satisficing-plan success compared to direct action-sequence prediction, using significantly fewer training instances and smaller models.

Generalized planning studies the construction of solution strategies that generalize across families of planning problems sharing a common domain model, formally defined by a transition function $γ: S \times A \rightarrow S$. Classical approaches achieve such generalization through symbolic abstractions and explicit reasoning over $γ$. In contrast, recent Transformer-based planners, such as PlanGPT and Plansformer, largely cast generalized planning as direct action-sequence prediction, bypassing explicit transition modeling. While effective on in-distribution instances, these approaches typically require large datasets and model sizes, and often suffer from state drift in long-horizon settings due to the absence of explicit world-state evolution. In this work, we formulate generalized planning as a transition-model learning problem, in which a neural model explicitly approximates the successor-state function $\hatγ \approx γ$ and generates plans by rolling out symbolic state trajectories. Instead of predicting actions directly, the model autoregressively predicts intermediate world states, thereby learning the domain dynamics as an implicit world model. To study size-invariant generalization and sample efficiency, we systematically evaluate multiple state representations and neural architectures, including relational graph encodings. Our results show that learning explicit transition models yields higher out-of-distribution satisficing-plan success than direct action-sequence prediction in multiple domains, while achieving these gains with significantly fewer training instances and smaller models. This is an extended version of a short paper accepted at ICAPS 2026 under the same title.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes