ROCVJan 8, 2024

Language-Conditioned Robotic Manipulation with Fast and Slow Thinking

arXiv:2401.04181v227 citationsh-index: 26ICRA
Originality Incremental advance
AI Analysis

This addresses robotic manipulation for tasks requiring language understanding, but appears incremental as it adapts existing cognitive theory to robotics.

The paper tackles language-conditioned robotic manipulation by introducing RFST, a framework inspired by dual process theory that uses fast and slow thinking systems to handle tasks from simple to complex. Results show it adeptly manages intricate tasks requiring intent recognition and reasoning in both simulation and real-world scenarios.

The language-conditioned robotic manipulation aims to transfer natural language instructions into executable actions, from simple pick-and-place to tasks requiring intent recognition and visual reasoning. Inspired by the dual process theory in cognitive science, which suggests two parallel systems of fast and slow thinking in human decision-making, we introduce Robotics with Fast and Slow Thinking (RFST), a framework that mimics human cognitive architecture to classify tasks and makes decisions on two systems based on instruction types. Our RFST consists of two key components: 1) an instruction discriminator to determine which system should be activated based on the current user instruction, and 2) a slow-thinking system that is comprised of a fine-tuned vision language model aligned with the policy networks, which allows the robot to recognize user intention or perform reasoning tasks. To assess our methodology, we built a dataset featuring real-world trajectories, capturing actions ranging from spontaneous impulses to tasks requiring deliberate contemplation. Our results, both in simulation and real-world scenarios, confirm that our approach adeptly manages intricate tasks that demand intent recognition and reasoning. The project is available at https://jlm-z.github.io/RSFT/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes