Mining--Gym: A Configurable RL Benchmarking Environment for Truck Dispatch Scheduling
This provides a domain-specific tool for researchers and practitioners in mining logistics to benchmark RL algorithms, though it is incremental as it builds on existing simulation and RL frameworks.
The paper tackles the lack of standardized benchmarking for reinforcement learning (RL) in mining truck dispatch scheduling by introducing Mining-Gym, a configurable open-source environment, and validates it by comparing heuristics with RL across six scenarios, showing it enables reproducible evaluation and demonstrates RL's strong performance potential.
Optimizing the mining process -- particularly truck dispatch scheduling -- is a key driver of efficiency in open-pit operations. However, the dynamic and stochastic nature of these environments, with uncertainties such as equipment failures, truck maintenance, and variable haul cycle times, challenges traditional optimization. While Reinforcement Learning (RL) shows strong potential for adaptive decision-making in mining logistics, practical deployment requires evaluation in realistic, customizable simulation environments. The lack of standardized benchmarking hampers fair algorithm comparison, reproducibility, and real-world applicability of RL solutions. To address this, we present Mining-Gym -- a configurable, open-source benchmarking environment for training, testing, and evaluating RL algorithms in mining process optimization. Built on Salabim-based Discrete Event Simulation (DES) and integrated with Gymnasium, Mining-Gym captures mining-specific uncertainties through an event-driven decision-point architecture. It offers a GUI for parameter configuration, data logging, and real-time visualization, supporting reproducible evaluation of RL strategies and heuristic baselines. We validate Mining-Gym by comparing classical heuristics with RL-based scheduling across six scenarios from normal operation to severe equipment failures. Results show it is an effective, reproducible testbed, enabling fair evaluation of adaptive decision-making and demonstrating the strong performance potential of RL-trained schedulers.