CVAIJan 30, 2022

Contrastive Learning from Demonstrations

arXiv:2201.12813v23 citations
AI Analysis

This work addresses the problem of efficient visual representation learning for robotic imitation, though it appears incremental as it optimizes an existing self-supervised method.

The paper tackles learning visual representations from unlabeled multi-view video demonstrations for robotic imitation tasks, such as pick and place, and shows that their contrastive learning method improves performance on metrics like viewpoint alignment and stage classification while reducing training iterations compared to state-of-the-art approaches.

This paper presents a framework for learning visual representations from unlabeled video demonstrations captured from multiple viewpoints. We show that these representations are applicable for imitating several robotic tasks, including pick and place. We optimize a recently proposed self-supervised learning algorithm by applying contrastive learning to enhance task-relevant information while suppressing irrelevant information in the feature embeddings. We validate the proposed method on the publicly available Multi-View Pouring and a custom Pick and Place data sets and compare it with the TCN triplet baseline. We evaluate the learned representations using three metrics: viewpoint alignment, stage classification and reinforcement learning, and in all cases the results improve when compared to state-of-the-art approaches, with the added benefit of reduced number of training iterations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes