PcLast: Discovering Plannable Continuous Latent States
This work addresses the challenge of inefficient planning in reinforcement learning by providing a method to learn plannable latent states, offering incremental improvements over existing representation learning approaches.
The paper tackles the problem of learning latent representations for goal-conditioned planning by focusing on state reachability, which improves sampling efficiency in reward-based settings and enables zero-sample hierarchical planning in reward-free settings.
Goal-conditioned planning benefits from learned low-dimensional representations of rich observations. While compact latent representations typically learned from variational autoencoders or inverse dynamics enable goal-conditioned decision making, they ignore state reachability, hampering their performance. In this paper, we learn a representation that associates reachable states together for effective planning and goal-conditioned policy learning. We first learn a latent representation with multi-step inverse dynamics (to remove distracting information), and then transform this representation to associate reachable states together in $\ell_2$ space. Our proposals are rigorously tested in various simulation testbeds. Numerical results in reward-based settings show significant improvements in sampling efficiency. Further, in reward-free settings this approach yields layered state abstractions that enable computationally efficient hierarchical planning for reaching ad hoc goals with zero additional samples.