Guiding Reinforcement Learning Exploration Using Natural Language
This work addresses the challenge of sample inefficiency in reinforcement learning for domain-specific applications like gaming, though it is incremental as it builds on existing policy shaping methods.
The paper tackles the problem of improving reinforcement learning generalization to unseen environments by using natural language descriptions to guide exploration, resulting in a modified policy shaping algorithm that outperforms Q-learning and baseline policy shaping in evaluations on the Frogger game.
In this work we present a technique to use natural language to help reinforcement learning generalize to unseen environments. This technique uses neural machine translation, specifically the use of encoder-decoder networks, to learn associations between natural language behavior descriptions and state-action information. We then use this learned model to guide agent exploration using a modified version of policy shaping to make it more effective at learning in unseen environments. We evaluate this technique using the popular arcade game, Frogger, under ideal and non-ideal conditions. This evaluation shows that our modified policy shaping algorithm improves over a Q-learning agent as well as a baseline version of policy shaping.