Informed POMDP: Leveraging Additional Information in Model-Based RL
This work addresses model-based reinforcement learning in POMDPs, offering an incremental improvement for scenarios with extra training information.
The authors tackled the problem of learning in partially observable Markov decision processes (POMDPs) by incorporating additional information available during training, proposing the informed POMDP paradigm and an adapted world model. They empirically demonstrated a learning speed improvement in several environments using this model in the Dreamer algorithm.
In this work, we generalize the problem of learning through interaction in a POMDP by accounting for eventual additional information available at training time. First, we introduce the informed POMDP, a new learning paradigm offering a clear distinction between the information at training and the observation at execution. Next, we propose an objective that leverages this information for learning a sufficient statistic of the history for the optimal control. We then adapt this informed objective to learn a world model able to sample latent trajectories. Finally, we empirically show a learning speed improvement in several environments using this informed world model in the Dreamer algorithm. These results and the simplicity of the proposed adaptation advocate for a systematic consideration of eventual additional information when learning in a POMDP using model-based RL.