LGAIJan 12

Reinforcement Learning Methods for Neighborhood Selection in Local Search

arXiv:2601.07948v1
Originality Incremental advance
AI Analysis

This work addresses the underexamined application of reinforcement learning in local search for combinatorial optimization, but it is incremental as it compares existing methods across standard problems.

The study evaluated reinforcement learning-based neighborhood selection strategies in local search metaheuristics for combinatorial optimization problems, finding that ε-greedy consistently performed well while deep reinforcement learning methods required longer runtimes to be competitive.

Reinforcement learning has recently gained traction as a means to improve combinatorial optimization methods, yet its effectiveness within local search metaheuristics specifically remains comparatively underexamined. In this study, we evaluate a range of reinforcement learning-based neighborhood selection strategies -- multi-armed bandits (upper confidence bound, $ε$-greedy) and deep reinforcement learning methods (proximal policy optimization, double deep $Q$-network) -- and compare them against multiple baselines across three different problems: the traveling salesman problem, the pickup and delivery problem with time windows, and the car sequencing problem. We show how search-specific characteristics, particularly large variations in cost due to constraint violation penalties, necessitate carefully designed reward functions to provide stable and informative learning signals. Our extensive experiments reveal that algorithm performance varies substantially across problems, although that $ε$-greedy consistently ranks among the best performers. In contrast, the computational overhead of deep reinforcement learning approaches only makes them competitive with a substantially longer runtime. These findings highlight both the promise and the practical limitations of deep reinforcement learning in local search.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes