Reinforcement Learning as an Improvement Heuristic for Real-World Production Scheduling
This work addresses production scheduling for industry partners, but it is incremental as it builds on existing trends of integrating RL with heuristics.
The paper tackled a real-world multiobjective production scheduling problem by using reinforcement learning as an improvement heuristic, starting with a suboptimal solution and iteratively swapping job pairs based on learned probabilities, and demonstrated superior performance against other heuristics on real industry data.
The integration of Reinforcement Learning (RL) with heuristic methods is an emerging trend for solving optimization problems, which leverages RL's ability to learn from the data generated during the search process. One promising approach is to train an RL agent as an improvement heuristic, starting with a suboptimal solution that is iteratively improved by applying small changes. We apply this approach to a real-world multiobjective production scheduling problem. Our approach utilizes a network architecture that includes Transformer encoding to learn the relationships between jobs. Afterwards, a probability matrix is generated from which pairs of jobs are sampled and then swapped to improve the solution. We benchmarked our approach against other heuristics using real data from our industry partner, demonstrating its superior performance.