LGOct 21, 2022

Continual Vision-based Reinforcement Learning with Group Symmetries

Shiqi Liu, Mengdi Xu, Piede Huang, Yongkang Liu, Kentaro Oguchi, Ding Zhao

arXiv:2210.12301v210.418 citationsh-index: 36

Originality Incremental advance

AI Analysis

This addresses a specific bottleneck in continual RL for vision-based tasks, offering incremental improvements in efficiency and generalization.

The paper tackles the problem of poor sample efficiency and weak generalization in continual reinforcement learning with visual inputs by recognizing that certain tasks are identical under group operations like rotations or translations. It introduces COVERS, a method that learns a policy for each group of equivalent tasks, and results show it significantly outperforms existing methods in generalization capability.

Continual reinforcement learning aims to sequentially learn a variety of tasks, retaining the ability to perform previously encountered tasks while simultaneously developing new policies for novel tasks. However, current continual RL approaches overlook the fact that certain tasks are identical under basic group operations like rotations or translations, especially with visual inputs. They may unnecessarily learn and maintain a new policy for each similar task, leading to poor sample efficiency and weak generalization capability. To address this, we introduce a unique Continual Vision-based Reinforcement Learning method that recognizes Group Symmetries, called COVERS, cultivating a policy for each group of equivalent tasks rather than individual tasks. COVERS employs a proximal policy optimization-based RL algorithm with an equivariant feature extractor and a novel task grouping mechanism that relies on the extracted invariant features. We evaluate COVERS on sequences of table-top manipulation tasks that incorporate image observations and robot proprioceptive information in both simulations and on real robot platforms. Our results show that COVERS accurately assigns tasks to their respective groups and significantly outperforms existing methods in terms of generalization capability.

View on arXiv PDF

Similar