LGApr 26, 2023

CROP: Towards Distributional-Shift Robust Reinforcement Learning using Compact Reshaped Observation Processing

Philipp Altmann, Fabian Ritz, Leonard Feuchtinger, Jonas Nüßlein, Claudia Linnhoff-Popien, Thomy Phan

arXiv:2304.13616v25.35 citationsh-index: 27Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of safe RL application for tasks under changing circumstances, though it appears incremental as it builds on existing observation crafting methods.

The paper tackles the problem of reinforcement learning (RL) failing to generalize to unseen scenarios due to overfitting, by proposing Compact Reshaped Observation Processing (CROP) to reduce state information for policy optimization, resulting in improved generalization in distributionally shifted environments and benchmark comparisons.

The safe application of reinforcement learning (RL) requires generalization from limited training data to unseen scenarios. Yet, fulfilling tasks under changing circumstances is a key challenge in RL. Current state-of-the-art approaches for generalization apply data augmentation techniques to increase the diversity of training data. Even though this prevents overfitting to the training environment(s), it hinders policy optimization. Crafting a suitable observation, only containing crucial information, has been shown to be a challenging task itself. To improve data efficiency and generalization capabilities, we propose Compact Reshaped Observation Processing (CROP) to reduce the state information used for policy optimization. By providing only relevant information, overfitting to a specific training layout is precluded and generalization to unseen environments is improved. We formulate three CROPs that can be applied to fully observable observation- and action-spaces and provide methodical foundation. We empirically show the improvements of CROP in a distributionally shifted safety gridworld. We furthermore provide benchmark comparisons to full observability and data-augmentation in two different-sized procedurally generated mazes.

View on arXiv PDF Code

Similar