LGJun 30, 2024

Model-Free Active Exploration in Reinforcement Learning

arXiv:2407.00801v17 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient exploration in RL for researchers and practitioners, offering a model-free solution that is applicable to both tabular and continuous MDPs, though it is incremental as it builds on existing information-theoretical frameworks.

The paper tackles the problem of exploration in Reinforcement Learning by deriving a model-free approximation of the instance-specific lower bound for sample efficiency and devising an ensemble-based exploration strategy. Numerical results show that this strategy identifies efficient policies faster than state-of-the-art approaches.

We study the problem of exploration in Reinforcement Learning and present a novel model-free solution. We adopt an information-theoretical viewpoint and start from the instance-specific lower bound of the number of samples that have to be collected to identify a nearly-optimal policy. Deriving this lower bound along with the optimal exploration strategy entails solving an intricate optimization problem and requires a model of the system. In turn, most existing sample optimal exploration algorithms rely on estimating the model. We derive an approximation of the instance-specific lower bound that only involves quantities that can be inferred using model-free approaches. Leveraging this approximation, we devise an ensemble-based model-free exploration strategy applicable to both tabular and continuous Markov decision processes. Numerical results demonstrate that our strategy is able to identify efficient policies faster than state-of-the-art exploration approaches

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes