LGMay 6, 2025Code
Unraveling the Rainbow: can value-based methods schedule?Arthur Corrêa, Alexandre Jesus, Cristóvão Silva et al.
Recently, deep reinforcement learning has emerged as a promising approach for solving complex combinatorial optimization problems. Broadly, deep reinforcement learning methods fall into two categories: policy-based and value-based. While value-based approaches have achieved notable success in domains such as the Arcade Learning Environment, the combinatorial optimization community has predominantly favored policy-based methods, often overlooking the potential of value-based algorithms. In this work, we conduct a comprehensive empirical evaluation of value-based algorithms, including the deep q-network and several of its advanced extensions, within the context of two complex combinatorial problems: the job-shop and the flexible job-shop scheduling problems, two fundamental challenges with multiple industrial applications. Our results challenge the assumption that policy-based methods are inherently superior for combinatorial optimization. We show that several value-based approaches can match or even outperform the widely adopted proximal policy optimization algorithm, suggesting that value-based strategies deserve greater attention from the combinatorial optimization community. Our code is openly available at: https://github.com/AJ-Correa/Unraveling-the-Rainbow.
LGMar 16, 2025
TuneNSearch: a hybrid transfer learning and local search approach for solving vehicle routing problemsArthur Corrêa, Cristóvão Silva, Liming Xu et al.
This paper introduces TuneNSearch, a hybrid transfer learning and local search approach for addressing diverse variants of the vehicle routing problem (VRP). Our method uses reinforcement learning to generate high-quality solutions, which are subsequently refined by an efficient local search procedure. To ensure broad adaptability across VRP variants, TuneNSearch begins with a pre-training phase on the multi-depot VRP (MDVRP), followed by a fine-tuning phase to adapt it to other problem formulations. The learning phase utilizes a Transformer-based architecture enhanced with edge-aware attention, which integrates edge distances directly into the attention mechanism to better capture spatial relationships inherent to routing problems. We show that the pre-trained model generalizes effectively to single-depot variants, achieving performance comparable to models trained specifically on single-depot instances. Simultaneously, it maintains strong performance on multi-depot variants, an ability that models pre-trained solely on single-depot problems lack. For example, on 100-node instances of multi-depot variants, TuneNSearch outperforms a model pre-trained on the CVRP by 44%. In contrast, on 100-node instances of single-depot variants, TuneNSearch performs similar to the CVRP model. To validate the effectiveness of our method, we conduct extensive computational experiments on public benchmark and randomly generated instances. Across multiple CVRPLIB datasets, TuneNSearch consistently achieves performance deviations of less than 3% from the best-known solutions in the literature, compared to 6-25% for other neural-based models, depending on problem complexity. Overall, our approach demonstrates strong generalization to different problem sizes, instance distributions, and VRP formulations, while maintaining polynomial runtime complexity despite the integration of the local search algorithm.