LGDec 13, 2021

Lifelong Hyper-Policy Optimization with Multiple Importance Sampling Regularization

arXiv:2112.06625v110 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of lifelong learning in reinforcement learning for practical applications such as resource management and trading, but it appears incremental as it builds on existing methods like importance sampling and regularization.

The paper tackles lifelong reinforcement learning with evolving dynamics by proposing a hyper-policy that outputs policy parameters over time, using importance sampling to reuse past data and mitigate catastrophic forgetting. It achieves competitive performance on realistic environments like water resource management and trading, though specific numerical gains are not detailed in the abstract.

Learning in a lifelong setting, where the dynamics continually evolve, is a hard challenge for current reinforcement learning algorithms. Yet this would be a much needed feature for practical applications. In this paper, we propose an approach which learns a hyper-policy, whose input is time, that outputs the parameters of the policy to be queried at that time. This hyper-policy is trained to maximize the estimated future performance, efficiently reusing past data by means of importance sampling, at the cost of introducing a controlled bias. We combine the future performance estimate with the past performance to mitigate catastrophic forgetting. To avoid overfitting the collected data, we derive a differentiable variance bound that we embed as a penalization term. Finally, we empirically validate our approach, in comparison with state-of-the-art algorithms, on realistic environments, including water resource management and trading.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes