LGAICVFeb 19, 2024

Revisiting Data Augmentation in Deep Reinforcement Learning

arXiv:2402.12181v110 citationsh-index: 2ICLR
Originality Incremental advance
AI Analysis

This work addresses the problem of selecting and optimizing data augmentation methods for researchers and practitioners in deep reinforcement learning, offering incremental improvements through analysis and adaptation.

The paper analyzes existing data augmentation techniques in deep reinforcement learning to understand their connections and effects, proposing a principled approach with a novel adaptation of tangent prop regularization, achieving state-of-the-art performance in most environments with higher sample efficiency and better generalization.

Various data augmentation techniques have been recently proposed in image-based deep reinforcement learning (DRL). Although they empirically demonstrate the effectiveness of data augmentation for improving sample efficiency or generalization, which technique should be preferred is not always clear. To tackle this question, we analyze existing methods to better understand them and to uncover how they are connected. Notably, by expressing the variance of the Q-targets and that of the empirical actor/critic losses of these methods, we can analyze the effects of their different components and compare them. We furthermore formulate an explanation about how these methods may be affected by choosing different data augmentation transformations in calculating the target Q-values. This analysis suggests recommendations on how to exploit data augmentation in a more principled way. In addition, we include a regularization term called tangent prop, previously proposed in computer vision, but whose adaptation to DRL is novel to the best of our knowledge. We evaluate our proposition and validate our analysis in several domains. Compared to different relevant baselines, we demonstrate that it achieves state-of-the-art performance in most environments and shows higher sample efficiency and better generalization ability in some complex environments.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes