DiMSam: Diffusion Models as Samplers for Task and Motion Planning under Partial Observability
This work addresses the challenge of applying TAMP to partially observable robot manipulation tasks, offering a hybrid solution that integrates generative models for improved sampling and constraint handling.
The paper tackles the problem of multi-step constraint reasoning in robot manipulation under partial observability by combining diffusion models with Task and Motion Planning (TAMP), enabling planning in domains with unknown environments and dynamics. It demonstrates the approach in simulated articulated object manipulation and real-world applications, showing improved capability for constraint-based reasoning.
Generative models such as diffusion models, excel at capturing high-dimensional distributions with diverse input modalities, e.g. robot trajectories, but are less effective at multi-step constraint reasoning. Task and Motion Planning (TAMP) approaches are suited for planning multi-step autonomous robot manipulation. However, it can be difficult to apply them to domains where the environment and its dynamics are not fully known. We propose to overcome these limitations by composing diffusion models using a TAMP system. We use the learned components for constraints and samplers that are difficult to engineer in the planning model, and use a TAMP solver to search for the task plan with constraint-satisfying action parameter values. To tractably make predictions for unseen objects in the environment, we define the learned samplers and TAMP operators on learned latent embedding of changing object states. We evaluate our approach in a simulated articulated object manipulation domain and show how the combination of classical TAMP, generative modeling, and latent embedding enables multi-step constraint-based reasoning. We also apply the learned sampler in the real world. Website: https://sites.google.com/view/dimsam-tamp