Extrapolation in Gridworld Markov-Decision Processes
This work addresses generalization challenges in reinforcement learning for agents in grid-based environments, but it is incremental as it builds on existing methods in a controlled setting.
The paper tackled the problem of extrapolation in reinforcement learning by testing four factors in a Gridworld environment, finding that avoiding deterministic action choice, using ego-centric representations, incorporating symmetry via invariant convolutions, and adding maximum entropy loss improved generalization to unseen states.
Extrapolation in reinforcement learning is the ability to generalize at test time given states that could never have occurred at training time. Here we consider four factors that lead to improved extrapolation in a simple Gridworld environment: (a) avoiding maximum Q-value (or other deterministic methods) for action choice at test time, (b) ego-centric representation of the Gridworld, (c) building rotational and mirror symmetry into the learning mechanism using rotational and mirror invariant convolution (rather than standard translation-invariant convolution), and (d) adding a maximum entropy term to the loss function to encourage equally good actions to be chosen equally often.