LGAIJan 13, 2022

Criticality-Based Varying Step-Number Algorithm for Reinforcement Learning

arXiv:2201.05034v1
AI Analysis

This work addresses the challenge of optimizing action selection efficiency in reinforcement learning for domains such as gaming, though it appears incremental as it builds on existing step-number methods.

The paper tackles the problem of inefficient step selection in reinforcement learning by introducing a criticality-based varying step number algorithm (CVS) that adapts step counts based on state criticality, and demonstrates its ability to outperform Deep Q-Learning and Monte Carlo in domains like Atari Pong.

In the context of reinforcement learning we introduce the concept of criticality of a state, which indicates the extent to which the choice of action in that particular state influences the expected return. That is, a state in which the choice of action is more likely to influence the final outcome is considered as more critical than a state in which it is less likely to influence the final outcome. We formulate a criticality-based varying step number algorithm (CVS) - a flexible step number algorithm that utilizes the criticality function provided by a human, or learned directly from the environment. We test it in three different domains including the Atari Pong environment, Road-Tree environment, and Shooter environment. We demonstrate that CVS is able to outperform popular learning algorithms such as Deep Q-Learning and Monte Carlo.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes