Explore the Context: Optimal Data Collection for Context-Conditional Dynamics Models
This addresses the challenge of optimal data collection for context-conditional dynamics models in reinforcement learning, but it appears incremental as it builds on existing probabilistic formulations for exploration.
The paper tackles the problem of learning dynamics models for families of dynamical systems with varying properties by using a stochastic process conditioned on a latent context variable, and it demonstrates effectiveness on a toy problem and two RL environments, though no concrete numbers are provided.
In this paper, we learn dynamics models for parametrized families of dynamical systems with varying properties. The dynamics models are formulated as stochastic processes conditioned on a latent context variable which is inferred from observed transitions of the respective system. The probabilistic formulation allows us to compute an action sequence which, for a limited number of environment interactions, optimally explores the given system within the parametrized family. This is achieved by steering the system through transitions being most informative for the context variable. We demonstrate the effectiveness of our method for exploration on a non-linear toy-problem and two well-known reinforcement learning environments.