An End-to-End Reinforcement Learning Approach for Job-Shop Scheduling Problems Based on Constraint Programming
This work addresses the problem of efficient scheduling for real-world applications by providing a scalable method that reduces reliance on handcrafted heuristics, though it is incremental as it builds on existing CP and RL techniques.
The paper tackles the scalability issue of Constraint Programming (CP) solvers for large Job-Shop Scheduling Problem (JSSP) instances by proposing an end-to-end Reinforcement Learning (RL) approach that learns a Priority Dispatching Rule (PDR) from generic CP encodings and small instances, achieving higher-quality solutions for very large instances compared to static PDRs and CP solvers within the same time limit.
Constraint Programming (CP) is a declarative programming paradigm that allows for modeling and solving combinatorial optimization problems, such as the Job-Shop Scheduling Problem (JSSP). While CP solvers manage to find optimal or near-optimal solutions for small instances, they do not scale well to large ones, i.e., they require long computation times or yield low-quality solutions. Therefore, real-world scheduling applications often resort to fast, handcrafted, priority-based dispatching heuristics to find a good initial solution and then refine it using optimization methods. This paper proposes a novel end-to-end approach to solving scheduling problems by means of CP and Reinforcement Learning (RL). In contrast to previous RL methods, tailored for a given problem by including procedural simulation algorithms, complex feature engineering, or handcrafted reward functions, our neural-network architecture and training algorithm merely require a generic CP encoding of some scheduling problem along with a set of small instances. Our approach leverages existing CP solvers to train an agent learning a Priority Dispatching Rule (PDR) that generalizes well to large instances, even from separate datasets. We evaluate our method on seven JSSP datasets from the literature, showing its ability to find higher-quality solutions for very large instances than obtained by static PDRs and by a CP solver within the same time limit.