LGAIRONov 15, 2021

Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization

arXiv:2111.07999v156 citations
Originality Highly original
AI Analysis

This work addresses the challenge of synthesizing complex robot behaviors for long-horizon tasks, which is incremental as it builds on skill chaining but introduces a novel regularization approach.

The paper tackles the problem of skill chaining for long-horizon robot manipulation by addressing failures when policies encounter unseen starting states, proposing adversarial terminal state regularization to avoid excessively large initial state distributions. The result is a model-free reinforcement learning algorithm that successfully solves complex furniture assembly tasks, establishing the first such solution where prior methods fail.

Skill chaining is a promising approach for synthesizing complex behaviors by sequentially combining previously learned skills. Yet, a naive composition of skills fails when a policy encounters a starting state never seen during its training. For successful skill chaining, prior approaches attempt to widen the policy's starting state distribution. However, these approaches require larger state distributions to be covered as more policies are sequenced, and thus are limited to short skill sequences. In this paper, we propose to chain multiple policies without excessively large initial state distributions by regularizing the terminal state distributions in an adversarial learning framework. We evaluate our approach on two complex long-horizon manipulation tasks of furniture assembly. Our results have shown that our method establishes the first model-free reinforcement learning algorithm to solve these tasks; whereas prior skill chaining approaches fail. The code and videos are available at https://clvrai.com/skill-chaining

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes