LGAIJan 28

Adapting the Behavior of Reinforcement Learning Agents to Changing Action Spaces and Reward Functions

arXiv:2601.20714v1h-index: 222025 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)
Originality Incremental advance
AI Analysis

This addresses the challenge of adapting RL agents to dynamic real-world conditions, but it is incremental as it builds on existing Q-learning methods.

The paper tackles the problem of reinforcement learning agents struggling with non-stationary environments, such as changing reward functions and action spaces, by introducing MORPHIN, a self-adaptive Q-learning framework that enables on-the-fly adaptation without full retraining. Results show MORPHIN improves learning efficiency by up to 1.7x in benchmarks like Gridworld and traffic signal control.

Reinforcement Learning (RL) agents often struggle in real-world applications where environmental conditions are non-stationary, particularly when reward functions shift or the available action space expands. This paper introduces MORPHIN, a self-adaptive Q-learning framework that enables on-the-fly adaptation without full retraining. By integrating concept drift detection with dynamic adjustments to learning and exploration hyperparameters, MORPHIN adapts agents to changes in both the reward function and on-the-fly expansions of the agent's action space, while preserving prior policy knowledge to prevent catastrophic forgetting. We validate our approach using a Gridworld benchmark and a traffic signal control simulation. The results demonstrate that MORPHIN achieves superior convergence speed and continuous adaptation compared to a standard Q-learning baseline, improving learning efficiency by up to 1.7x.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes