ROLGDec 26, 2020

Multi-Instance Aware Localization for End-to-End Imitation Learning

arXiv:2101.01053v1
Originality Incremental advance
AI Analysis

This work improves the sample efficiency and localization accuracy of end-to-end imitation learning for robotic manipulation, which is significant for researchers and practitioners dealing with data-scarce scenarios.

This paper addresses the poor performance of image-to-action policy networks in imitation learning when multiple instances of an object are present and training data is limited. The authors propose an architecture that combines a feature map with an instance preference embedding and an autoregressive action generator, achieving training with as few as 15 expert demonstrations for robot manipulation tasks.

Existing architectures for imitation learning using image-to-action policy networks perform poorly when presented with an input image containing multiple instances of the object of interest, especially when the number of expert demonstrations available for training are limited. We show that end-to-end policy networks can be trained in a sample efficient manner by (a) appending the feature map output of the vision layers with an embedding that can indicate instance preference or take advantage of an implicit preference present in the expert demonstrations, and (b) employing an autoregressive action generator network for the control layers. The proposed architecture for localization has improved accuracy and sample efficiency and can generalize to the presence of more instances of objects than seen during training. When used for end-to-end imitation learning to perform reach, push, and pick-and-place tasks on a real robot, training is achieved with as few as 15 expert demonstrations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes