World Programs for Model-Based Learning and Planning in Compositional State and Action Spaces
This addresses model-based learning in compositional spaces for AI planning, but appears incremental as it builds on existing concepts without broad SOTA claims.
The paper tackles the problem of applying reinforcement learning in environments without cheap simulators by proposing a formalism to learn a world program that models dynamics and actions from state transitions, enabling planning tasks.
Some of the most important tasks take place in environments which lack cheap and perfect simulators, thus hampering the application of model-free reinforcement learning (RL). While model-based RL aims to learn a dynamics model, in a more general case the learner does not know a priori what the action space is. Here we propose a formalism where the learner induces a world program by learning a dynamics model and the actions in graph-based compositional environments by observing state-state transition examples. Then, the learner can perform RL with the world program as the simulator for complex planning tasks. We highlight a recent application, and propose a challenge for the community to assess world program-based planning.