Scaling All-Goals Updates in Reinforcement Learning Using Convolutional Neural Networks
This addresses the challenge of efficient navigation and exploration for agents in complex environments, though it is an incremental improvement over existing all-goals updating methods.
The paper tackled the problem of scaling all-goals updates in reinforcement learning, which was previously limited to small tabular cases, by using convolutional neural networks to generate Q-values for many goals simultaneously, achieving better exploratory trajectories in games like Montezuma's Revenge and Super Mario All-Stars.
Being able to reach any desired location in the environment can be a valuable asset for an agent. Learning a policy to navigate between all pairs of states individually is often not feasible. An all-goals updating algorithm uses each transition to learn Q-values towards all goals simultaneously and off-policy. However the expensive numerous updates in parallel limited the approach to small tabular cases so far. To tackle this problem we propose to use convolutional network architectures to generate Q-values and updates for a large number of goals at once. We demonstrate the accuracy and generalization qualities of the proposed method on randomly generated mazes and Sokoban puzzles. In the case of on-screen goal coordinates the resulting mapping from frames to distance-maps directly informs the agent about which places are reachable and in how many steps. As an example of application we show that replacing the random actions in epsilon-greedy exploration by several actions towards feasible goals generates better exploratory trajectories on Montezuma's Revenge and Super Mario All-Stars games.