ROCVJun 14, 2024

Contrastive Imitation Learning for Language-guided Multi-Task Robotic Manipulation

arXiv:2406.09738v120 citations
Originality Incremental advance
AI Analysis

This addresses the problem of enabling robots to perform diverse manipulation tasks from language commands, though it appears incremental with specific architectural improvements.

The paper tackles multi-task robotic manipulation guided by language instructions by introducing Sigma-Agent, which uses contrastive imitation learning and a multi-view querying Transformer. It achieves a 5.2-5.9% improvement over state-of-the-art methods on 18 RLBench tasks and a 62% success rate in real-world tasks.

Developing robots capable of executing various manipulation tasks, guided by natural language instructions and visual observations of intricate real-world environments, remains a significant challenge in robotics. Such robot agents need to understand linguistic commands and distinguish between the requirements of different tasks. In this work, we present Sigma-Agent, an end-to-end imitation learning agent for multi-task robotic manipulation. Sigma-Agent incorporates contrastive Imitation Learning (contrastive IL) modules to strengthen vision-language and current-future representations. An effective and efficient multi-view querying Transformer (MVQ-Former) for aggregating representative semantic information is introduced. Sigma-Agent shows substantial improvement over state-of-the-art methods under diverse settings in 18 RLBench tasks, surpassing RVT by an average of 5.2% and 5.9% in 10 and 100 demonstration training, respectively. Sigma-Agent also achieves 62% success rate with a single policy in 5 real-world manipulation tasks. The code will be released upon acceptance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes