LG AI ROAug 17, 2022

Metric Residual Networks for Sample Efficient Goal-Conditioned Reinforcement Learning

Bo Liu, Yihao Feng, Qiang Liu, Peter Stone

Apple

arXiv:2208.08133v411.111 citationsh-index: 27Has Code

Originality Highly original

AI Analysis

This work addresses sample efficiency for robotics applications like manipulation and navigation, where it is crucial due to sparse rewards, and is incremental as it focuses on neural architecture design within an existing framework.

The paper tackles the problem of sample efficiency in goal-conditioned reinforcement learning (GCRL) by introducing a novel neural architecture called metric residual network (MRN), which decomposes the action-value function to satisfy the triangle inequality, resulting in significantly better sample efficiency across 12 benchmark environments compared to state-of-the-art methods.

Goal-conditioned reinforcement learning (GCRL) has a wide range of potential real-world applications, including manipulation and navigation problems in robotics. Especially in such robotics tasks, sample efficiency is of the utmost importance for GCRL since, by default, the agent is only rewarded when it reaches its goal. While several methods have been proposed to improve the sample efficiency of GCRL, one relatively under-studied approach is the design of neural architectures to support sample efficiency. In this work, we introduce a novel neural architecture for GCRL that achieves significantly better sample efficiency than the commonly-used monolithic network architecture. The key insight is that the optimal action-value function Q^*(s, a, g) must satisfy the triangle inequality in a specific sense. Furthermore, we introduce the metric residual network (MRN) that deliberately decomposes the action-value function Q(s,a,g) into the negated summation of a metric plus a residual asymmetric component. MRN provably approximates any optimal action-value function Q^*(s,a,g), thus making it a fitting neural architecture for GCRL. We conduct comprehensive experiments across 12 standard benchmark environments in GCRL. The empirical results demonstrate that MRN uniformly outperforms other state-of-the-art GCRL neural architectures in terms of sample efficiency.

View on arXiv PDF Code

Similar