LGAIROOct 17, 2024

Latent Weight Diffusion: Generating reactive policies instead of trajectories

arXiv:2410.14040v23 citationsh-index: 9Has Code
Originality Incremental advance
AI Analysis

This addresses computational efficiency and robustness issues for robotic control tasks, though it is incremental as it builds on existing diffusion methods.

The paper tackles the trade-off between performance and inference cost in diffusion-based robotic policies by proposing Latent Weight Diffusion (LWD), which generates closed-loop policies instead of trajectories, resulting in higher success rates with longer action horizons and under perturbations while reducing inference FLOPS to ~1/45th of Diffusion Policy.

With the increasing availability of open-source robotic data, imitation learning has emerged as a viable approach for both robot manipulation and locomotion. Currently, large generalized policies are trained to predict controls or trajectories using diffusion models, which have the desirable property of learning multimodal action distributions. However, generalizability comes with a cost, namely, larger model size and slower inference. This is especially an issue for robotic tasks that require high control frequency. Further, there is a known trade-off between performance and action horizon for Diffusion Policy (DP), a popular model for generating trajectories: fewer diffusion queries accumulate greater trajectory tracking errors. For these reasons, it is common practice to run these models at high inference frequency, subject to robot computational constraints. To address these limitations, we propose Latent Weight Diffusion (LWD), a method that uses diffusion to generate closed-loop policies (weights for neural policies) for robotic tasks, rather than generating trajectories. Learning the behavior distribution through parameter space over trajectory space offers two key advantages: longer action horizons (fewer diffusion queries) & robustness to perturbations while retaining high performance; and a lower inference compute cost. To this end, we show that LWD has higher success rates than DP when the action horizon is longer and when stochastic perturbations exist in the environment. Furthermore, LWD achieves multitask performance comparable to DP while requiring just ~1/45th of the inference-time FLOPS

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes