Learning controllable dynamics through informative exploration
This addresses the challenge of model-free control in reinforcement learning, but it appears incremental as it builds on existing exploration methods.
The paper tackles the problem of learning controllable dynamics in environments without explicit models by using predicted information gain to guide exploration, resulting in reliable estimates of dynamics compared to myopic approaches.
Environments with controllable dynamics are usually understood in terms of explicit models. However, such models are not always available, but may sometimes be learned by exploring an environment. In this work, we investigate using an information measure called "predicted information gain" to determine the most informative regions of an environment to explore next. Applying methods from reinforcement learning allows good suboptimal exploring policies to be found, and leads to reliable estimates of the underlying controllable dynamics. This approach is demonstrated by comparing with several myopic exploration approaches.