LGAICVMay 20, 2022

Learning Task-relevant Representations for Generalization via Characteristic Functions of Reward Sequence Distributions

arXiv:2205.10218v325 citationsh-index: 64
Originality Incremental advance
AI Analysis

This addresses generalization challenges in visual RL for real-world applications, but it is incremental as it builds on existing methods for handling distractions.

The paper tackles the problem of visual distractions degrading generalization in visual reinforcement learning by proposing CRESP, which learns task-relevant representations via characteristic functions of reward sequence distributions, resulting in significant performance improvements on unseen environments in DeepMind Control tasks.

Generalization across different environments with the same tasks is critical for successful applications of visual reinforcement learning (RL) in real scenarios. However, visual distractions -- which are common in real scenes -- from high-dimensional observations can be hurtful to the learned representations in visual RL, thus degrading the performance of generalization. To tackle this problem, we propose a novel approach, namely Characteristic Reward Sequence Prediction (CRESP), to extract the task-relevant information by learning reward sequence distributions (RSDs), as the reward signals are task-relevant in RL and invariant to visual distractions. Specifically, to effectively capture the task-relevant information via RSDs, CRESP introduces an auxiliary task -- that is, predicting the characteristic functions of RSDs -- to learn task-relevant representations, because we can well approximate the high-dimensional distributions by leveraging the corresponding characteristic functions. Experiments demonstrate that CRESP significantly improves the performance of generalization on unseen environments, outperforming several state-of-the-arts on DeepMind Control tasks with different visual distractions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes