FLEX: an Adaptive Exploration Algorithm for Nonlinear Systems
This work addresses the challenge of costly data collection for accurate modeling in reinforcement learning, particularly for nonlinear and time-varying dynamics, offering a practical solution for real-world systems with computational limitations.
The paper tackles the problem of sample-efficient exploration in model-based reinforcement learning for nonlinear systems by introducing FLEX, an adaptive exploration algorithm based on optimal experimental design, which achieves competitive performance with low computational cost on various nonlinear environments and downstream control tasks.
Model-based reinforcement learning is a powerful tool, but collecting data to fit an accurate model of the system can be costly. Exploring an unknown environment in a sample-efficient manner is hence of great importance. However, the complexity of dynamics and the computational limitations of real systems make this task challenging. In this work, we introduce FLEX, an exploration algorithm for nonlinear dynamics based on optimal experimental design. Our policy maximizes the information of the next step and results in an adaptive exploration algorithm, compatible with generic parametric learning models and requiring minimal resources. We test our method on a number of nonlinear environments covering different settings, including time-varying dynamics. Keeping in mind that exploration is intended to serve an exploitation objective, we also test our algorithm on downstream model-based classical control tasks and compare it to other state-of-the-art model-based and model-free approaches. The performance achieved by FLEX is competitive and its computational cost is low.