LG AIJan 28

Adapting the Behavior of Reinforcement Learning Agents to Changing Action Spaces and Reward Functions

Raul de la Rosa, Ivana Dusparic, Nicolas Cardozo

arXiv:2601.20714v11.4h-index: 222025 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)

Originality Incremental advance

AI Analysis

This addresses the challenge of adapting RL agents to dynamic real-world conditions, but it is incremental as it builds on existing Q-learning methods.

The paper tackles the problem of reinforcement learning agents struggling with non-stationary environments, such as changing reward functions and action spaces, by introducing MORPHIN, a self-adaptive Q-learning framework that enables on-the-fly adaptation without full retraining. Results show MORPHIN improves learning efficiency by up to 1.7x in benchmarks like Gridworld and traffic signal control.

Reinforcement Learning (RL) agents often struggle in real-world applications where environmental conditions are non-stationary, particularly when reward functions shift or the available action space expands. This paper introduces MORPHIN, a self-adaptive Q-learning framework that enables on-the-fly adaptation without full retraining. By integrating concept drift detection with dynamic adjustments to learning and exploration hyperparameters, MORPHIN adapts agents to changes in both the reward function and on-the-fly expansions of the agent's action space, while preserving prior policy knowledge to prevent catastrophic forgetting. We validate our approach using a Gridworld benchmark and a traffic signal control simulation. The results demonstrate that MORPHIN achieves superior convergence speed and continuous adaptation compared to a standard Q-learning baseline, improving learning efficiency by up to 1.7x.

View on arXiv PDF

Similar