LG AIJan 29, 2024

The Indoor-Training Effect: unexpected gains from distribution shifts in the transition function

Serena Bono, Spandan Madan, Ishaan Grover, Mao Yasueda, Cynthia Breazeal, Hanspeter Pfister, Gabriel Kreiman

arXiv:2401.15856v24.61 citationsh-index: 9AAAI

Originality Incremental advance

AI Analysis

This addresses a fundamental problem in reinforcement learning for researchers and practitioners by challenging conventional wisdom on environment matching, though it appears incremental as it builds on existing MDP frameworks.

The paper investigates whether distribution shifts in transition probabilities between training and testing environments in reinforcement learning can improve performance, finding that agents trained on noise-free environments often outperform those trained on noisy environments when tested on noisy variations, as demonstrated across 60 ATARI game variations.

Is it better to perform tennis training in a pristine indoor environment or a noisy outdoor one? To model this problem, here we investigate whether shifts in the transition probabilities between the training and testing environments in reinforcement learning problems can lead to better performance under certain conditions. We generate new Markov Decision Processes (MDPs) starting from a given MDP, by adding quantifiable, parametric noise into the transition function. We refer to this process as Noise Injection and the resulting environments as δ-environments. This process allows us to create variations of the same environment with quantitative control over noise serving as a metric of distance between environments. Conventional wisdom suggests that training and testing on the same MDP should yield the best results. In stark contrast, we observe that agents can perform better when trained on the noise-free environment and tested on the noisy δ-environments, compared to training and testing on the same δ-environments. We confirm that this finding extends beyond noise variations: it is possible to showcase the same phenomenon in ATARI game variations including varying Ghost behaviour in PacMan, and Paddle behaviour in Pong. We demonstrate this intriguing behaviour across 60 different variations of ATARI games, including PacMan, Pong, and Breakout. We refer to this phenomenon as the Indoor-Training Effect. Code to reproduce our experiments and to implement Noise Injection can be found at https://bit.ly/3X6CTYk.

View on arXiv PDF

Similar