LGJun 3, 2025

MTL-KD: Multi-Task Learning Via Knowledge Distillation for Generalizable Neural Vehicle Routing Solver

Yuepeng Zheng, Fu Luo, Zhenkun Wang, Yaoxin Wu, Yu Zhou

arXiv:2506.02935v418.810 citationsh-index: 6

Originality Incremental advance

AI Analysis

This work addresses the problem of poor generalization in multi-task neural combinatorial optimization for vehicle routing, offering a method that improves scalability and performance on diverse tasks, though it is incremental in combining existing techniques like knowledge distillation.

The paper tackled the limited generalization of existing multi-task learning methods for neural vehicle routing solvers by proposing MTL-KD, which uses knowledge distillation to train heavy decoder models, achieving superior performance on up to 1000-node problems across 16 VRP variants.

Multi-Task Learning (MTL) in Neural Combinatorial Optimization (NCO) is a promising approach to train a unified model capable of solving multiple Vehicle Routing Problem (VRP) variants. However, existing Reinforcement Learning (RL)-based multi-task methods can only train light decoder models on small-scale problems, exhibiting limited generalization ability when solving large-scale problems. To overcome this limitation, this work introduces a novel multi-task learning method driven by knowledge distillation (MTL-KD), which enables the efficient training of heavy decoder models with strong generalization ability. The proposed MTL-KD method transfers policy knowledge from multiple distinct RL-based single-task models to a single heavy decoder model, facilitating label-free training and effectively improving the model's generalization ability across diverse tasks. In addition, we introduce a flexible inference strategy termed Random Reordering Re-Construction (R3C), which is specifically adapted for diverse VRP tasks and further boosts the performance of the multi-task model. Experimental results on 6 seen and 10 unseen VRP variants with up to 1000 nodes indicate that our proposed method consistently achieves superior performance on both uniform and real-world benchmarks, demonstrating robust generalization abilities.

View on arXiv PDF

Similar