Multi-objective Pointer Network for Combinatorial Optimization
This work addresses multi-objective combinatorial optimization, a common challenge in real applications, by introducing a more efficient deep learning approach, though it appears incremental as it builds on existing pointer network methods.
The paper tackles multi-objective combinatorial optimization problems by proposing a multi-objective Pointer Network (MOPN), a single-model deep reinforcement learning framework that improves input structure and uses training strategies to enhance performance. It demonstrates that MOPN outperforms state-of-the-art and classical methods, achieving better results with only 20% to 40% of the training time of a comparable DRL model.
Multi-objective combinatorial optimization problems (MOCOPs), one type of complex optimization problems, widely exist in various real applications. Although meta-heuristics have been successfully applied to address MOCOPs, the calculation time is often much longer. Recently, a number of deep reinforcement learning (DRL) methods have been proposed to generate approximate optimal solutions to the combinatorial optimization problems. However, the existing studies on DRL have seldom focused on MOCOPs. This study proposes a single-model deep reinforcement learning framework, called multi-objective Pointer Network (MOPN), where the input structure of PN is effectively improved so that the single PN is capable of solving MOCOPs. In addition, two training strategies, based on representative model and transfer learning, respectively, are proposed to further enhance the performance of MOPN in different application scenarios. Moreover, compared to classical meta-heuristics, MOPN only consumes much less time on forward propagation to obtain the Pareto front. Meanwhile, MOPN is insensitive to problem scale, meaning that a trained MOPN is able to address MOCOPs with different scales. To verify the performance of MOPN, extensive experiments are conducted on three multi-objective traveling salesman problems, in comparison with one state-of-the-art model DRL-MOA and three classical multi-objective meta-heuristics. Experimental results demonstrate that the proposed model outperforms all the comparative methods with only 20\% to 40\% training time of DRL-MOA.