ROAICVJul 31, 2024

DEF-oriCORN: efficient 3D scene understanding for robust language-directed manipulation without demonstrations

arXiv:2407.21267v11 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses the challenge of robust and efficient manipulation planning from verbal commands for robotics, offering a solution that works in complex environments without requiring demonstrations, though it builds incrementally on existing methods.

The paper tackles the problem of language-directed manipulation in tightly packed environments with sparse camera views by introducing DEF-oriCORN, which uses a novel object-based scene representation and diffusion-model-based state estimation to enable efficient and robust planning without demonstrations. It achieves superior estimation and motion planning performance compared to state-of-the-art baselines and zero-shot generalizes to real-world scenarios with diverse materials, including transparent and reflective objects.

We present DEF-oriCORN, a framework for language-directed manipulation tasks. By leveraging a novel object-based scene representation and diffusion-model-based state estimation algorithm, our framework enables efficient and robust manipulation planning in response to verbal commands, even in tightly packed environments with sparse camera views without any demonstrations. Unlike traditional representations, our representation affords efficient collision checking and language grounding. Compared to state-of-the-art baselines, our framework achieves superior estimation and motion planning performance from sparse RGB images and zero-shot generalizes to real-world scenarios with diverse materials, including transparent and reflective objects, despite being trained exclusively in simulation. Our code for data generation, training, inference, and pre-trained weights are publicly available at: https://sites.google.com/view/def-oricorn/home.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes