AIJan 11, 2018

Counterfactual equivalence for POMDPs, and underlying deterministic environments

arXiv:1801.03737v25.63 citations

Originality Synthesis-oriented

AI Analysis

This provides a theoretical framework for analyzing uncertainty and learning in POMDPs, which are widely used in machine learning, but the work appears incremental as it builds on existing POMDP theory without broad practical applications.

The paper tackles the problem of understanding information and causal structures in Partially Observable Markov Decision Processes (POMDPs) by introducing concepts of equivalent and counterfactually equivalent POMDPs, showing that any POMDP is counterfactually equivalent to a deterministic POMDP with uncertainty in the initial state for any finite number of turns.

Partially Observable Markov Decision Processes (POMDPs) are rich environments often used in machine learning. But the issue of information and causal structures in POMDPs has been relatively little studied. This paper presents the concepts of equivalent and counterfactually equivalent POMDPs, where agents cannot distinguish which environment they are in though any observations and actions. It shows that any POMDP is counterfactually equivalent, for any finite number of turns, to a deterministic POMDP with all uncertainty concentrated into the initial state. This allows a better understanding of POMDP uncertainty, information, and learning.

View on arXiv PDF

Similar