Counterfactual Data Augmentation using Locally Factored Dynamics
This work addresses sample efficiency for reinforcement learning in domains like robotic control, though it appears incremental as it builds on existing causal modeling and data augmentation techniques.
The paper tackled the problem of sample inefficiency in reinforcement learning for dynamic processes with sparse interactions by introducing local causal models and a counterfactual data augmentation algorithm, which significantly improved RL agent performance in locally factored tasks.
Many dynamic processes, including common scenarios in robotic control and reinforcement learning (RL), involve a set of interacting subprocesses. Though the subprocesses are not independent, their interactions are often sparse, and the dynamics at any given time step can often be decomposed into locally independent causal mechanisms. Such local causal structures can be leveraged to improve the sample efficiency of sequence prediction and off-policy reinforcement learning. We formalize this by introducing local causal models (LCMs), which are induced from a global causal model by conditioning on a subset of the state space. We propose an approach to inferring these structures given an object-oriented state representation, as well as a novel algorithm for Counterfactual Data Augmentation (CoDA). CoDA uses local structures and an experience replay to generate counterfactual experiences that are causally valid in the global model. We find that CoDA significantly improves the performance of RL agents in locally factored tasks, including the batch-constrained and goal-conditioned settings.