LGHCSep 12, 2022

Deterministic Sequencing of Exploration and Exploitation for Reinforcement Learning

arXiv:2209.05408v34 citationsh-index: 30
Originality Incremental advance
AI Analysis

This addresses the challenge of balancing exploration and exploitation in RL for improved learning efficiency, though it appears incremental as it builds on existing model-based RL frameworks.

The paper tackles the problem of simultaneously learning a system model and optimal policy in model-based reinforcement learning by proposing the DSEE algorithm, which interleaves exploration and exploitation epochs to achieve sub-linear cumulative regret growth over time.

We propose Deterministic Sequencing of Exploration and Exploitation (DSEE) algorithm with interleaving exploration and exploitation epochs for model-based RL problems that aim to simultaneously learn the system model, i.e., a Markov decision process (MDP), and the associated optimal policy. During exploration, DSEE explores the environment and updates the estimates for expected reward and transition probabilities. During exploitation, the latest estimates of the expected reward and transition probabilities are used to obtain a robust policy with high probability. We design the lengths of the exploration and exploitation epochs such that the cumulative regret grows as a sub-linear function of time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes