LG AIJan 5, 2024

Simple Hierarchical Planning with Diffusion

Chang Chen, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, Sungjin Ahn

arXiv:2401.02644v131.990 citationsh-index: 16ICLR

Originality Highly original

AI Analysis

This addresses computational bottlenecks and generalization issues in diffusion-based planning for offline reinforcement learning, offering a more efficient and effective method for long-horizon tasks.

The paper tackles the computational and generalization challenges of diffusion-based planning in long-horizon tasks by introducing the Hierarchical Diffuser, which combines hierarchical and diffusion-based planning with a 'jumpy' strategy. It demonstrates superior performance and efficiency on offline reinforcement learning benchmarks, including improved generalization on compositional out-of-distribution tasks.

Diffusion-based generative methods have proven effective in modeling trajectories with offline datasets. However, they often face computational challenges and can falter in generalization, especially in capturing temporal abstractions for long-horizon tasks. To overcome this, we introduce the Hierarchical Diffuser, a simple, fast, yet surprisingly effective planning method combining the advantages of hierarchical and diffusion-based planning. Our model adopts a "jumpy" planning strategy at the higher level, which allows it to have a larger receptive field but at a lower computational cost -- a crucial factor for diffusion-based planning methods, as we have empirically verified. Additionally, the jumpy sub-goals guide our low-level planner, facilitating a fine-tuning stage and further improving our approach's effectiveness. We conducted empirical evaluations on standard offline reinforcement learning benchmarks, demonstrating our method's superior performance and efficiency in terms of training and planning speed compared to the non-hierarchical Diffuser as well as other hierarchical planning methods. Moreover, we explore our model's generalization capability, particularly on how our method improves generalization capabilities on compositional out-of-distribution tasks.

View on arXiv PDF

Similar