Regulating Reward Training by Means of Certainty Prediction in a Neural Network-Implemented Pong Game
This work addresses the challenge of enhancing training efficiency and performance in reinforcement learning for video games, though it appears incremental as it builds on existing reward-modulated methods.
The researchers tackled the problem of improving reinforcement learning in Pong by introducing a model that uses an intuition neural network to regulate reward training based on certainty predictions, resulting in the model quickly outperforming a simpler architecture and outscoring a near-perfect opponent by an increasingly wide margin after additional training.
We present the first reinforcement-learning model to self-improve its reward-modulated training implemented through a continuously improving "intuition" neural network. An agent was trained how to play the arcade video game Pong with two reward-based alternatives, one where the paddle was placed randomly during training, and a second where the paddle was simultaneously trained on three additional neural networks such that it could develop a sense of "certainty" as to how probable its own predicted paddle position will be to return the ball. If the agent was less than 95% certain to return the ball, the policy used an intuition neural network to place the paddle. We trained both architectures for an equivalent number of epochs and tested learning performance by letting the trained programs play against a near-perfect opponent. Through this, we found that the reinforcement learning model that uses an intuition neural network for placing the paddle during reward training quickly overtakes the simple architecture in its ability to outplay the near-perfect opponent, additionally outscoring that opponent by an increasingly wide margin after additional epochs of training.