LG AIAug 7, 2024

Learning Rate-Free Reinforcement Learning: A Case for Model Selection with Non-Stationary Objectives

arXiv:2408.04046v14.61 citationsh-index: 3Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of hyperparameter tuning in RL for practitioners, offering a generalizable solution, though it is incremental as it builds on existing model selection techniques.

The paper tackles the sensitivity of reinforcement learning (RL) algorithms to learning rate hyperparameters by introducing a model selection framework that adaptively tunes the learning rate using reward feedback, resulting in improved performance with data-driven strategies outperforming standard bandit methods in non-stationary settings.

The performance of reinforcement learning (RL) algorithms is sensitive to the choice of hyperparameters, with the learning rate being particularly influential. RL algorithms fail to reach convergence or demand an extensive number of samples when the learning rate is not optimally set. In this work, we show that model selection can help to improve the failure modes of RL that are due to suboptimal choices of learning rate. We present a model selection framework for Learning Rate-Free Reinforcement Learning that employs model selection methods to select the optimal learning rate on the fly. This approach of adaptive learning rate tuning neither depends on the underlying RL algorithm nor the optimizer and solely uses the reward feedback to select the learning rate; hence, the framework can input any RL algorithm and produce a learning rate-free version of it. We conduct experiments for policy optimization methods and evaluate various model selection strategies within our framework. Our results indicate that data-driven model selection algorithms are better alternatives to standard bandit algorithms when the optimal choice of hyperparameter is time-dependent and non-stationary.

View on arXiv PDF Code

Similar