CVAIROJan 31, 2023

CRC-RL: A Novel Visual Feature Representation Architecture for Unsupervised Reinforcement Learning

arXiv:2301.13473v21 citationsh-index: 10
AI Analysis

This addresses the problem of improving unsupervised reinforcement learning performance for researchers and practitioners in AI, though it appears incremental as it builds on existing methods with a novel loss combination.

The paper tackles visual feature representation learning to enhance end-to-end reinforcement learning models, proposing the CRC-RL architecture with a heterogeneous loss function that outperforms state-of-the-art methods on DeepMind Control Suite environments by a significant margin.

This paper addresses the problem of visual feature representation learning with an aim to improve the performance of end-to-end reinforcement learning (RL) models. Specifically, a novel architecture is proposed that uses a heterogeneous loss function, called CRC loss, to learn improved visual features which can then be used for policy learning in RL. The CRC-loss function is a combination of three individual loss functions, namely, contrastive, reconstruction and consistency loss. The feature representation is learned in parallel to the policy learning while sharing the weight updates through a Siamese Twin encoder model. This encoder model is augmented with a decoder network and a feature projection network to facilitate computation of the above loss components. Through empirical analysis involving latent feature visualization, an attempt is made to provide an insight into the role played by this loss function in learning new action-dependent features and how they are linked to the complexity of the problems being solved. The proposed architecture, called CRC-RL, is shown to outperform the existing state-of-the-art methods on the challenging Deep mind control suite environments by a significant margin thereby creating a new benchmark in this field.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes