ROAIJun 16, 2022

Equivariant Descriptor Fields: SE(3)-Equivariant Energy-Based Models for End-to-End Visual Robotic Manipulation Learning

arXiv:2206.08321v30.3874 citationsh-index: 28Has Code
AI Analysis85

This work addresses the problem of requiring large demonstration datasets for robotic manipulation, offering a sample-efficient solution that could reduce training costs and improve adaptability in robotics.

The paper tackles sample inefficiency in end-to-end visual robotic manipulation learning by introducing SE(3)-equivariant energy-based models, achieving high sample efficiency with only 5-10 demonstrations and demonstrating generalization to unseen object poses, instances, and distractors.

End-to-end learning for visual robotic manipulation is known to suffer from sample inefficiency, requiring large numbers of demonstrations. The spatial roto-translation equivariance, or the SE(3)-equivariance can be exploited to improve the sample efficiency for learning robotic manipulation. In this paper, we present SE(3)-equivariant models for visual robotic manipulation from point clouds that can be trained fully end-to-end. By utilizing the representation theory of the Lie group, we construct novel SE(3)-equivariant energy-based models that allow highly sample efficient end-to-end learning. We show that our models can learn from scratch without prior knowledge and yet are highly sample efficient (5~10 demonstrations are enough). Furthermore, we show that our models can generalize to tasks with (i) previously unseen target object poses, (ii) previously unseen target object instances of the category, and (iii) previously unseen visual distractors. We experiment with 6-DoF robotic manipulation tasks to validate our models' sample efficiency and generalizability. Codes are available at: https://github.com/tomato1mule/edf

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes