Hyperparameters in Contextual RL are Highly Situational
This highlights a critical problem for RL practitioners in deploying stable systems, as it reveals incremental insights into hyperparameter sensitivity.
The paper tackles the instability of reinforcement learning (RL) in real-world applications by demonstrating that hyperparameters optimized automatically are highly situational, depending on problem specifics and state descriptions, with agents in contextual RL requiring different hyperparameters when shown environmental changes.
Although Reinforcement Learning (RL) has shown impressive results in games and simulation, real-world application of RL suffers from its instability under changing environment conditions and hyperparameters. We give a first impression of the extent of this instability by showing that the hyperparameters found by automatic hyperparameter optimization (HPO) methods are not only dependent on the problem at hand, but even on how well the state describes the environment dynamics. Specifically, we show that agents in contextual RL require different hyperparameters if they are shown how environmental factors change. In addition, finding adequate hyperparameter configurations is not equally easy for both settings, further highlighting the need for research into how hyperparameters influence learning and generalization in RL.