LGAICLOct 27, 2022

Language Control Diffusion: Efficiently Scaling through Space, Time, and Tasks

arXiv:2210.15629v434 citationsh-index: 28
Originality Incremental advance
AI Analysis

This work addresses the problem of scaling generalist agents for robotics and AI, offering a more efficient solution for long-horizon control tasks conditioned on language, though it appears incremental by building on existing diffusion model architectures.

The paper tackles the challenge of training generalist agents across high-dimensional inputs, long horizons, and novel tasks by proposing Language Control Diffusion (LCD), a hierarchical planner that uses language to control diffusion models. It outperforms state-of-the-art methods on the CALVIN benchmark in multi-task success rates and improves inference speed by 3.3x to 15x compared to other diffusion models.

Training generalist agents is difficult across several axes, requiring us to deal with high-dimensional inputs (space), long horizons (time), and generalization to novel tasks. Recent advances with architectures have allowed for improved scaling along one or two of these axes, but are still computationally prohibitive to use. In this paper, we propose to address all three axes by leveraging \textbf{L}anguage to \textbf{C}ontrol \textbf{D}iffusion models as a hierarchical planner conditioned on language (LCD). We effectively and efficiently scale diffusion models for planning in extended temporal, state, and task dimensions to tackle long horizon control problems conditioned on natural language instructions, as a step towards generalist agents. Comparing LCD with other state-of-the-art models on the CALVIN language robotics benchmark finds that LCD outperforms other SOTA methods in multi-task success rates, whilst improving inference speed over other comparable diffusion models by 3.3x~15x. We show that LCD can successfully leverage the unique strength of diffusion models to produce coherent long range plans while addressing their weakness in generating low-level details and control.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes