LGAIITJan 21, 2024

Information-Theoretic State Variable Selection for Reinforcement Learning

arXiv:2401.11512v15 citations
Originality Incremental advance
AI Analysis

This addresses the problem of inefficient state representation in reinforcement learning for researchers and practitioners, offering a method to enhance learning efficiency, though it is incremental as it builds on existing feature selection approaches.

The paper tackles the challenge of selecting state variables in reinforcement learning by introducing the Transfer Entropy Redundancy Criterion (TERC), which identifies and excludes irrelevant variables to improve sample efficiency, achieving speed-ups across multiple algorithm classes like Q-learning, Actor-Critic, and PPO in various environments.

Identifying the most suitable variables to represent the state is a fundamental challenge in Reinforcement Learning (RL). These variables must efficiently capture the information necessary for making optimal decisions. In order to address this problem, in this paper, we introduce the Transfer Entropy Redundancy Criterion (TERC), an information-theoretic criterion, which determines if there is \textit{entropy transferred} from state variables to actions during training. We define an algorithm based on TERC that provably excludes variables from the state that have no effect on the final performance of the agent, resulting in more sample efficient learning. Experimental results show that this speed-up is present across three different algorithm classes (represented by tabular Q-learning, Actor-Critic, and Proximal Policy Optimization (PPO)) in a variety of environments. Furthermore, to highlight the differences between the proposed methodology and the current state-of-the-art feature selection approaches, we present a series of controlled experiments on synthetic data, before generalizing to real-world decision-making tasks. We also introduce a representation of the problem that compactly captures the transfer of information from state variables to actions as Bayesian networks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes