LGAIJan 29, 2024

The Indoor-Training Effect: unexpected gains from distribution shifts in the transition function

arXiv:2401.15856v21 citationsh-index: 9AAAI
Originality Incremental advance
AI Analysis

This addresses a fundamental problem in reinforcement learning for researchers and practitioners by challenging conventional wisdom on environment matching, though it appears incremental as it builds on existing MDP frameworks.

The paper investigates whether distribution shifts in transition probabilities between training and testing environments in reinforcement learning can improve performance, finding that agents trained on noise-free environments often outperform those trained on noisy environments when tested on noisy variations, as demonstrated across 60 ATARI game variations.

Is it better to perform tennis training in a pristine indoor environment or a noisy outdoor one? To model this problem, here we investigate whether shifts in the transition probabilities between the training and testing environments in reinforcement learning problems can lead to better performance under certain conditions. We generate new Markov Decision Processes (MDPs) starting from a given MDP, by adding quantifiable, parametric noise into the transition function. We refer to this process as Noise Injection and the resulting environments as δ-environments. This process allows us to create variations of the same environment with quantitative control over noise serving as a metric of distance between environments. Conventional wisdom suggests that training and testing on the same MDP should yield the best results. In stark contrast, we observe that agents can perform better when trained on the noise-free environment and tested on the noisy δ-environments, compared to training and testing on the same δ-environments. We confirm that this finding extends beyond noise variations: it is possible to showcase the same phenomenon in ATARI game variations including varying Ghost behaviour in PacMan, and Paddle behaviour in Pong. We demonstrate this intriguing behaviour across 60 different variations of ATARI games, including PacMan, Pong, and Breakout. We refer to this phenomenon as the Indoor-Training Effect. Code to reproduce our experiments and to implement Noise Injection can be found at https://bit.ly/3X6CTYk.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes