LGAIJan 27

Tracking Drift: Variation-Aware Entropy Scheduling for Non-Stationary Reinforcement Learning

arXiv:2601.19624v14 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of non-stationary environments in real-world RL applications, offering a practical solution with minimal overhead, though it is incremental as it builds on existing entropy scheduling methods.

The paper tackles the problem of environment drift in reinforcement learning by proposing Adaptive Entropy Scheduling (AES), which adaptively adjusts entropy coefficients online using drift signals, resulting in significantly reduced performance degradation and faster recovery after abrupt changes across multiple tasks and drift modes.

Real-world reinforcement learning often faces environment drift, but most existing methods rely on static entropy coefficients/target entropy, causing over-exploration during stable periods and under-exploration after drift (thus slow recovery), and leaving unanswered the principled question of how exploration intensity should scale with drift magnitude. We prove that entropy scheduling under non-stationarity can be reduced to a one-dimensional, round-by-round trade-off, faster tracking of the optimal solution after drift vs. avoiding gratuitous randomness when the environment is stable, so exploration strength can be driven by measurable online drift signals. Building on this, we propose AES (Adaptive Entropy Scheduling), which adaptively adjusts the entropy coefficient/temperature online using observable drift proxies during training, requiring almost no structural changes and incurring minimal overhead. Across 4 algorithm variants, 12 tasks, and 4 drift modes, AES significantly reduces the fraction of performance degradation caused by drift and accelerates recovery after abrupt changes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes