TreeDQN: Learning to minimize Branch-and-Bound tree
This work addresses a domain-specific bottleneck in combinatorial optimization, offering incremental improvements over existing reinforcement learning approaches.
The authors tackled the problem of improving the efficiency of Branch-and-Bound solvers for combinatorial optimization by learning a branching heuristic using reinforcement learning, resulting in an agent that requires less training data and produces smaller trees compared to prior methods.
Combinatorial optimization problems require an exhaustive search to find the optimal solution. A convenient approach to solving combinatorial optimization tasks in the form of Mixed Integer Linear Programs is Branch-and-Bound. Branch-and-Bound solver splits a task into two parts dividing the domain of an integer variable, then it solves them recursively, producing a tree of nested sub-tasks. The efficiency of the solver depends on the branchning heuristic used to select a variable for splitting. In the present work, we propose a reinforcement learning method that can efficiently learn the branching heuristic. We view the variable selection task as a tree Markov Decision Process, prove that the Bellman operator adapted for the tree Markov Decision Process is contracting in mean, and propose a modified learning objective for the reinforcement learning agent. Our agent requires less training data and produces smaller trees compared to previous reinforcement learning methods.