RO CVFeb 28, 2022

Weakly Supervised Disentangled Representation for Goal-conditioned Reinforcement Learning

Zhifeng Qian, Mingyu You, Hongjun Zhou, Bin He

arXiv:2202.13624v15.57 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of high sample requirements in goal-conditioned reinforcement learning for agents learning multiple skills in dynamic environments, offering an incremental improvement through a novel hybrid method.

The paper tackles the sample inefficiency and poor generalization in goal-conditioned reinforcement learning by proposing DR-GRL, a framework that combines disentangled representation learning with reinforcement learning to learn interpretable representations of object attributes, enabling agents to generate unseen goals for practice. Empirically, DR-GRL significantly outperforms previous methods in sample efficiency and policy generalization, with results applicable to real robots.

Goal-conditioned reinforcement learning is a crucial yet challenging algorithm which enables agents to achieve multiple user-specified goals when learning a set of skills in a dynamic environment. However, it typically requires millions of the environmental interactions explored by agents, which is sample-inefficient. In the paper, we propose a skill learning framework DR-GRL that aims to improve the sample efficiency and policy generalization by combining the Disentangled Representation learning and Goal-conditioned visual Reinforcement Learning. In a weakly supervised manner, we propose a Spatial Transform AutoEncoder (STAE) to learn an interpretable and controllable representation in which different parts correspond to different object attributes (shape, color, position). Due to the high controllability of the representations, STAE can simply recombine and recode the representations to generate unseen goals for agents to practice themselves. The manifold structure of the learned representation maintains consistency with the physical position, which is beneficial for reward calculation. We empirically demonstrate that DR-GRL significantly outperforms the previous methods in sample efficiency and policy generalization. In addition, DR-GRL is also easy to expand to the real robot.

View on arXiv PDF

Similar