Model-Based Reinforcement Learning via Latent-Space Collocation
This addresses a bottleneck in autonomous agents for temporally extended tasks, though it is an incremental adaptation of collocation methods to the image-based setting.
The paper tackles the challenge of long-horizon tasks in visual model-based reinforcement learning by planning sequences of states instead of actions, resulting in improved performance over previous methods on tasks with sparse rewards and long-term goals.
The ability to plan into the future while utilizing only raw high-dimensional observations, such as images, can provide autonomous agents with broad capabilities. Visual model-based reinforcement learning (RL) methods that plan future actions directly have shown impressive results on tasks that require only short-horizon reasoning, however, these methods struggle on temporally extended tasks. We argue that it is easier to solve long-horizon tasks by planning sequences of states rather than just actions, as the effects of actions greatly compound over time and are harder to optimize. To achieve this, we draw on the idea of collocation, which has shown good results on long-horizon tasks in optimal control literature, and adapt it to the image-based setting by utilizing learned latent state space models. The resulting latent collocation method (LatCo) optimizes trajectories of latent states, which improves over previously proposed shooting methods for visual model-based RL on tasks with sparse rewards and long-term goals. Videos and code at https://orybkin.github.io/latco/.