LGJan 24, 2025

Reducing Action Space for Deep Reinforcement Learning via Causal Effect Estimation

arXiv:2501.14543v1h-index: 7Has Code
Originality Incremental advance
AI Analysis

This work addresses a domain-specific problem for reinforcement learning practitioners by providing a quantitative approach to reduce action space redundancy, though it is incremental as it builds on prior methods for action reduction.

The paper tackles the challenge of inefficient exploration in deep reinforcement learning due to large and redundant action spaces by proposing a method that estimates causal effects of actions to suppress redundant ones, achieving improved exploration efficiency in simulations with redundant actions.

Intelligent decision-making within large and redundant action spaces remains challenging in deep reinforcement learning. Considering similar but ineffective actions at each step can lead to repetitive and unproductive trials. Existing methods attempt to improve agent exploration by reducing or penalizing redundant actions, yet they fail to provide quantitative and reliable evidence to determine redundancy. In this paper, we propose a method to improve exploration efficiency by estimating the causal effects of actions. Unlike prior methods, our approach offers quantitative results regarding the causality of actions for one-step transitions. We first pre-train an inverse dynamics model to serve as prior knowledge of the environment. Subsequently, we classify actions across the entire action space at each time step and estimate the causal effect of each action to suppress redundant actions during exploration. We provide a theoretical analysis to demonstrate the effectiveness of our method and present empirical results from simulations in environments with redundant actions to evaluate its performance. Our implementation is available at https://github.com/agi-brain/cee.git.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes