LGAIJan 2, 2022

Toward Causal-Aware RL: State-Wise Action-Refined Temporal Difference

arXiv:2201.00354v212 citations
AI Analysis

This work addresses the issue of inefficient exploration in RL for continuous control tasks, which is incremental as it builds on existing exploration strategies by incorporating causality.

The paper tackled the problem of action space redundancy in continuous control RL by proposing state-dependent action selection methods to discover causal relationships between actions and task rewards, resulting in improved learning efficiency for action-redundant tasks.

Although it is well known that exploration plays a key role in Reinforcement Learning (RL), prevailing exploration strategies for continuous control tasks in RL are mainly based on naive isotropic Gaussian noise regardless of the causality relationship between action space and the task and consider all dimensions of actions equally important. In this work, we propose to conduct interventions on the primal action space to discover the causal relationship between the action space and the task reward. We propose the method of State-Wise Action Refined (SWAR), which addresses the issue of action space redundancy and promote causality discovery in RL. We formulate causality discovery in RL tasks as a state-dependent action space selection problem and propose two practical algorithms as solutions. The first approach, TD-SWAR, detects task-related actions during temporal difference learning, while the second approach, Dyn-SWAR, reveals important actions through dynamic model prediction. Empirically, both methods provide approaches to understand the decisions made by RL agents and improve learning efficiency in action-redundant tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes