NE AIJun 20, 2025

Robust Dynamic Material Handling via Adaptive Constrained Evolutionary Reinforcement Learning

Chengpeng Hu, Ziming Wang, Bo Yuan, Jialin Liu, Chengqi Zhang, Xin Yao

arXiv:2506.16795v1h-index: 5IEEE Trans Neural Netw Learn Syst

Originality Incremental advance

AI Analysis

This work addresses real-time scheduling in logistics and manufacturing, offering a robust solution for dynamic task assignment with constraints, though it appears incremental as it builds on existing reinforcement learning methods.

The paper tackles dynamic material handling by proposing ACERL, an adaptive constrained evolutionary reinforcement learning approach that schedules vehicles to minimize makespan and tardiness while satisfying constraints, achieving outstanding performance on eight test instances and robust results on 40 noised instances compared to state-of-the-art algorithms.

Dynamic material handling (DMH) involves the assignment of dynamically arriving material transporting tasks to suitable vehicles in real time for minimising makespan and tardiness. In real-world scenarios, historical task records are usually available, which enables the training of a decision policy on multiple instances consisting of historical records. Recently, reinforcement learning has been applied to solve DMH. Due to the occurrence of dynamic events such as new tasks, adaptability is highly required. Solving DMH is challenging since constraints including task delay should be satisfied. A feedback is received only when all tasks are served, which leads to sparse reward. Besides, making the best use of limited computational resources and historical records for training a robust policy is crucial. The time allocated to different problem instances would highly impact the learning process. To tackle those challenges, this paper proposes a novel adaptive constrained evolutionary reinforcement learning (ACERL) approach, which maintains a population of actors for diverse exploration. ACERL accesses each actor for tackling sparse rewards and constraint violation to restrict the behaviour of the policy. Moreover, ACERL adaptively selects the most beneficial training instances for improving the policy. Extensive experiments on eight training and eight unseen test instances demonstrate the outstanding performance of ACERL compared with several state-of-the-art algorithms. Policies trained by ACERL can schedule the vehicles while fully satisfying the constraints. Additional experiments on 40 unseen noised instances show the robust performance of ACERL. Cross-validation further presents the overall effectiveness of ACREL. Besides, a rigorous ablation study highlights the coordination and benefits of each ingredient of ACERL.

View on arXiv PDF

Similar