Criticality-Based Varying Step-Number Algorithm for Reinforcement Learning
This work addresses the challenge of optimizing action selection efficiency in reinforcement learning for domains such as gaming, though it appears incremental as it builds on existing step-number methods.
The paper tackles the problem of inefficient step selection in reinforcement learning by introducing a criticality-based varying step number algorithm (CVS) that adapts step counts based on state criticality, and demonstrates its ability to outperform Deep Q-Learning and Monte Carlo in domains like Atari Pong.
In the context of reinforcement learning we introduce the concept of criticality of a state, which indicates the extent to which the choice of action in that particular state influences the expected return. That is, a state in which the choice of action is more likely to influence the final outcome is considered as more critical than a state in which it is less likely to influence the final outcome. We formulate a criticality-based varying step number algorithm (CVS) - a flexible step number algorithm that utilizes the criticality function provided by a human, or learned directly from the environment. We test it in three different domains including the Atari Pong environment, Road-Tree environment, and Shooter environment. We demonstrate that CVS is able to outperform popular learning algorithms such as Deep Q-Learning and Monte Carlo.