LGOct 13, 2025

Efficient Restarts in Non-Stationary Model-Free Reinforcement Learning

arXiv:2510.11933v1h-index: 1
Originality Incremental advance
AI Analysis

This work improves efficiency for RL practitioners dealing with changing environments, though it is incremental as it builds on existing algorithms like RestartQ-UCB and RANDOMIZEDQ.

The paper tackles the problem of inefficient restarts in non-stationary model-free reinforcement learning by proposing three new restart paradigms (partial, adaptive, and selective) to address issues like complete forgetting and scheduled restarts, resulting in up to 91% reduction in dynamic regret compared to prior methods.

In this work, we propose three efficient restart paradigms for model-free non-stationary reinforcement learning (RL). We identify two core issues with the restart design of Mao et al. (2022)'s RestartQ-UCB algorithm: (1) complete forgetting, where all the information learned about an environment is lost after a restart, and (2) scheduled restarts, in which restarts occur only at predefined timings, regardless of the incompatibility of the policy with the current environment dynamics. We introduce three approaches, which we call partial, adaptive, and selective restarts to modify the algorithms RestartQ-UCB and RANDOMIZEDQ (Wang et al., 2025). We find near-optimal empirical performance in multiple different environments, decreasing dynamic regret by up to $91$% relative to RestartQ-UCB.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes