RONov 9, 2021

AW-Opt: Learning Robotic Skills with Imitation and Reinforcement at Scale

arXiv:2111.05424v262 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of efficiently learning complex robotic skills for robotics researchers and practitioners, but it is incremental as it builds on existing methods by optimizing their combination for scalability.

The paper tackles the challenge of scaling combined imitation and reinforcement learning for robotic skills by developing AW-Opt, which integrates advantage-weighted regression and QT-Opt to utilize demonstrations and offline data, achieving improved performance on real-world and simulated robotic manipulation tasks.

Robotic skills can be learned via imitation learning (IL) using user-provided demonstrations, or via reinforcement learning (RL) using large amountsof autonomously collected experience.Both methods have complementarystrengths and weaknesses: RL can reach a high level of performance, but requiresexploration, which can be very time consuming and unsafe; IL does not requireexploration, but only learns skills that are as good as the provided demonstrations.Can a single method combine the strengths of both approaches? A number ofprior methods have aimed to address this question, proposing a variety of tech-niques that integrate elements of IL and RL. However, scaling up such methodsto complex robotic skills that integrate diverse offline data and generalize mean-ingfully to real-world scenarios still presents a major challenge. In this paper, ouraim is to test the scalability of prior IL + RL algorithms and devise a system basedon detailed empirical experimentation that combines existing components in themost effective and scalable way. To that end, we present a series of experimentsaimed at understanding the implications of each design decision, so as to develop acombined approach that can utilize demonstrations and heterogeneous prior datato attain the best performance on a range of real-world and realistic simulatedrobotic problems. Our complete method, which we call AW-Opt, combines ele-ments of advantage-weighted regression [1, 2] and QT-Opt [3], providing a unifiedapproach for integrating demonstrations and offline data for robotic manipulation.Please see https://awopt.github.io for more details.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes