Adapting the Behavior of Reinforcement Learning Agents to Changing Action Spaces and Reward Functions
This addresses the challenge of adapting RL agents to dynamic real-world conditions, but it is incremental as it builds on existing Q-learning methods.
The paper tackles the problem of reinforcement learning agents struggling with non-stationary environments, such as changing reward functions and action spaces, by introducing MORPHIN, a self-adaptive Q-learning framework that enables on-the-fly adaptation without full retraining. Results show MORPHIN improves learning efficiency by up to 1.7x in benchmarks like Gridworld and traffic signal control.
Reinforcement Learning (RL) agents often struggle in real-world applications where environmental conditions are non-stationary, particularly when reward functions shift or the available action space expands. This paper introduces MORPHIN, a self-adaptive Q-learning framework that enables on-the-fly adaptation without full retraining. By integrating concept drift detection with dynamic adjustments to learning and exploration hyperparameters, MORPHIN adapts agents to changes in both the reward function and on-the-fly expansions of the agent's action space, while preserving prior policy knowledge to prevent catastrophic forgetting. We validate our approach using a Gridworld benchmark and a traffic signal control simulation. The results demonstrate that MORPHIN achieves superior convergence speed and continuous adaptation compared to a standard Q-learning baseline, improving learning efficiency by up to 1.7x.