LGAIMLMar 12, 2020

Invariant Causal Prediction for Block MDPs

arXiv:2003.06016v2155 citations
AI Analysis

This addresses the challenge of generalization in reinforcement learning for real-world applications, though it is incremental as it builds on existing state abstraction and causal inference frameworks.

The paper tackles the problem of learning state abstractions that generalize across environments in block MDPs, where environments share latent dynamics but have varying observations, by using invariant causal prediction to identify causal features related to return, resulting in improved generalization over baselines in both linear and nonlinear settings.

Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges. In this paper, we consider the problem of learning abstractions that generalize in block MDPs, families of environments with a shared latent state space and dynamics structure over that latent space, but varying observations. We leverage tools from causal inference to propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting. We prove that for certain classes of environments, this approach outputs with high probability a state abstraction corresponding to the causal feature set with respect to the return. We further provide more general bounds on model error and generalization error in the multi-environment setting, in the process showing a connection between causal variable selection and the state abstraction framework for MDPs. We give empirical evidence that our methods work in both linear and nonlinear settings, attaining improved generalization over single- and multi-task baselines.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes