Rethink Efficiency Side of Neural Combinatorial Solver: An Offline and Self-Play Paradigm
This work addresses efficiency bottlenecks for researchers and practitioners in neural combinatorial optimization, though it is incremental as it builds on existing methods like DPO and Mamba.
The paper tackled the inefficiency of neural combinatorial optimization by proposing ECO, an offline self-play paradigm that uses supervised warm-up and Direct Preference Optimization, achieving competitive performance on TSP and CVRP with significant improvements in memory utilization and training throughput.
We propose ECO, a versatile learning paradigm that enables efficient offline self-play for Neural Combinatorial Optimization (NCO). ECO addresses key limitations in the field through: 1) Paradigm Shift: Moving beyond inefficient online paradigms, we introduce a two-phase offline paradigm consisting of supervised warm-up and iterative Direct Preference Optimization (DPO); 2) Architecture Shift: We deliberately design a Mamba-based architecture to further enhance the efficiency in the offline paradigm; and 3) Progressive Bootstrapping: To stabilize training, we employ a heuristic-based bootstrapping mechanism that ensures continuous policy improvement during training. Comparison results on TSP and CVRP highlight that ECO performs competitively with up-to-date baselines, with significant advantage on the efficiency side in terms of memory utilization and training throughput. We provide further in-depth analysis on the efficiency, throughput and memory usage of ECO. Ablation studies show rationale behind our designs.