Dynamical-VAE-based Hindsight to Learn the Causal Dynamics of Factored-POMDPs
This addresses the problem of accurate state representation learning in POMDPs for reinforcement learning applications, though it appears incremental as it builds on existing hindsight and VAE methods.
The paper tackled the challenge of learning causal dynamics from partial observations in factored-POMDPs by introducing a Dynamical-VAE with an extended hindsight framework that integrates past, current, and future information, resulting in more effective uncovering of the causal graph compared to existing models.
Learning representations of underlying environmental dynamics from partial observations is a critical challenge in machine learning. In the context of Partially Observable Markov Decision Processes (POMDPs), state representations are often inferred from the history of past observations and actions. We demonstrate that incorporating future information is essential to accurately capture causal dynamics and enhance state representations. To address this, we introduce a Dynamical Variational Auto-Encoder (DVAE) designed to learn causal Markovian dynamics from offline trajectories in a POMDP. Our method employs an extended hindsight framework that integrates past, current, and multi-step future information within a factored-POMDP setting. Empirical results reveal that this approach uncovers the causal graph governing hidden state transitions more effectively than history-based and typical hindsight-based models.