Reinforcement Learning with Curriculum-inspired Adaptive Direct Policy Guidance for Truck Dispatching
This addresses truck dispatching inefficiencies in mining operations, though it appears incremental as an adaptation of existing RL techniques.
The paper tackles inefficient truck dispatching in open-pit mining by introducing a curriculum learning strategy for policy-based reinforcement learning, resulting in a 10% performance gain and faster convergence over standard methods.
Efficient truck dispatching via Reinforcement Learning (RL) in open-pit mining is often hindered by reliance on complex reward engineering and value-based methods. This paper introduces Curriculum-inspired Adaptive Direct Policy Guidance, a novel curriculum learning strategy for policy-based RL to address these issues. We adapt Proximal Policy Optimization (PPO) for mine dispatching's uneven decision intervals using time deltas in Temporal Difference and Generalized Advantage Estimation, and employ a Shortest Processing Time teacher policy for guided exploration via policy regularization and adaptive guidance. Evaluations in OpenMines demonstrate our approach yields a 10% performance gain and faster convergence over standard PPO across sparse and dense reward settings, showcasing improved robustness to reward design. This direct policy guidance method provides a general and effective curriculum learning technique for RL-based truck dispatching, enabling future work on advanced architectures.