LGDec 13, 2021

Lifelong Hyper-Policy Optimization with Multiple Importance Sampling Regularization

Pierre Liotet, Francesco Vidaich, Alberto Maria Metelli, Marcello Restelli

arXiv:2112.06625v15.510 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of lifelong learning in reinforcement learning for practical applications such as resource management and trading, but it appears incremental as it builds on existing methods like importance sampling and regularization.

The paper tackles lifelong reinforcement learning with evolving dynamics by proposing a hyper-policy that outputs policy parameters over time, using importance sampling to reuse past data and mitigate catastrophic forgetting. It achieves competitive performance on realistic environments like water resource management and trading, though specific numerical gains are not detailed in the abstract.

Learning in a lifelong setting, where the dynamics continually evolve, is a hard challenge for current reinforcement learning algorithms. Yet this would be a much needed feature for practical applications. In this paper, we propose an approach which learns a hyper-policy, whose input is time, that outputs the parameters of the policy to be queried at that time. This hyper-policy is trained to maximize the estimated future performance, efficiently reusing past data by means of importance sampling, at the cost of introducing a controlled bias. We combine the future performance estimate with the past performance to mitigate catastrophic forgetting. To avoid overfitting the collected data, we derive a differentiable variance bound that we embed as a penalization term. Finally, we empirically validate our approach, in comparison with state-of-the-art algorithms, on realistic environments, including water resource management and trading.

View on arXiv PDF

Similar