A Relative Ignorability Framework for Decision-Relevant Observability in Control Theory and Reinforcement Learning
This expands the theoretical foundations for safe, data-efficient AI in real-world environments with partial observability, though it appears incremental relative to existing causal and RL paradigms.
The paper tackles the problem of sequential decision-making with incomplete data by introducing a 'relative ignorability' framework that unifies causal inference and reinforcement learning approaches, showing that Q-learning can converge even when the Markov property fails under certain conditions.
Sequential decision-making systems routinely operate with missing or incomplete data. Classical reinforcement learning theory, which is commonly used to solve sequential decision problems, assumes Markovian observability, which may not hold under partial observability. Causal inference paradigms formalise ignorability of missingness. We show these views can be unified and generalized in order to guarantee Q-learning convergence even when the Markov property fails. To do so, we introduce the concept of relative ignorability. Relative ignorability is a graphical-causal criterion which refines the requirements for accurate decision-making based on incomplete data. Theoretical results and simulations both reveal that non-Markovian stochastic processes whose missingness is relatively ignorable with respect to causal estimands can still be optimized using standard Reinforcement Learning algorithms. These results expand the theoretical foundations of safe, data-efficient AI to real-world environments where complete information is unattainable.