Reinforcement Learning for Node Selection in Branch-and-Bound
This addresses a key bottleneck in optimization algorithms for operations research, offering a novel approach that outperforms existing methods.
The paper tackles the problem of node selection in branch-and-bound algorithms by proposing a reinforcement learning method that considers the entire tree state, achieving significant improvements in optimality gap reductions and per-node efficiency on benchmarks under strict time constraints.
A big challenge in branch and bound lies in identifying the optimal node within the search tree from which to proceed. Current state-of-the-art selectors utilize either hand-crafted ensembles that automatically switch between naive sub-node selectors, or learned node selectors that rely on individual node data. We propose a novel simulation technique that uses reinforcement learning (RL) while considering the entire tree state, rather than just isolated nodes. To achieve this, we train a graph neural network that produces a probability distribution based on the path from the model's root to its "to-be-selected" leaves. Modelling node-selection as a probability distribution allows us to train the model using state-of-the-art RL techniques that capture both intrinsic node-quality and node-evaluation costs. Our method induces a high quality node selection policy on a set of varied and complex problem sets, despite only being trained on specially designed, synthetic travelling salesmen problem (TSP) instances. Using such a fixed pretrained policy shows significant improvements on several benchmarks in optimality gap reductions and per-node efficiency under strict time constraints.