Robust Deep Reinforcement Learning Scheduling via Weight Anchoring
This work addresses robustness issues for practitioners deploying data-driven scheduling methods in real-world environments, though it is incremental as it adapts an existing continual learning technique.
The paper tackled the problem of robustness in deep reinforcement learning when moving from simulation to reality by using weight anchoring to preserve desired behaviors, achieving performance comparable to state-of-the-art simulation augmentation with significantly increased robustness and steerability.
Questions remain on the robustness of data-driven learning methods when crossing the gap from simulation to reality. We utilize weight anchoring, a method known from continual learning, to cultivate and fixate desired behavior in Neural Networks. Weight anchoring may be used to find a solution to a learning problem that is nearby the solution of another learning problem. Thereby, learning can be carried out in optimal environments without neglecting or unlearning desired behavior. We demonstrate this approach on the example of learning mixed QoS-efficient discrete resource scheduling with infrequent priority messages. Results show that this method provides performance comparable to the state of the art of augmenting a simulation environment, alongside significantly increased robustness and steerability.