Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation
This addresses the problem of long-horizon planning for robotic manipulation in cluttered environments, representing an incremental improvement over existing model-based methods.
The paper tackles the challenge of planning effective action sequences for multi-step robotic manipulation by introducing the CAVIN Planner, a model-based method that uses cascaded variational inference to hierarchically generate plans from latent spaces. The results show it outperforms state-of-the-art model-based methods in three cluttered tabletop tasks, though no specific numerical gains are provided.
The fundamental challenge of planning for multi-step manipulation is to find effective and plausible action sequences that lead to the task goal. We present Cascaded Variational Inference (CAVIN) Planner, a model-based method that hierarchically generates plans by sampling from latent spaces. To facilitate planning over long time horizons, our method learns latent representations that decouple the prediction of high-level effects from the generation of low-level motions through cascaded variational inference. This enables us to model dynamics at two different levels of temporal resolutions for hierarchical planning. We evaluate our approach in three multi-step robotic manipulation tasks in cluttered tabletop environments given high-dimensional observations. Empirical results demonstrate that the proposed method outperforms state-of-the-art model-based methods by strategically interacting with multiple objects.