OCAIDMLGJun 3, 2022

A Deep Reinforcement Learning Framework For Column Generation

U of Toronto
arXiv:2206.02568v342 citationsh-index: 24Has Code
Originality Highly original
AI Analysis

This work addresses the computational bottleneck in solving large-scale integer linear programs, such as in logistics and resource allocation, by introducing a novel RL-based method that improves convergence speed, though it is incremental in applying RL to an existing optimization framework.

The paper tackles the problem of accelerating Column Generation (CG) for large-scale linear programs by proposing RLCG, the first Reinforcement Learning approach, which reduces CG iterations by 22.4% for the Cutting Stock Problem and 40.9% for the Vehicle Routing Problem with Time Windows compared to a greedy baseline.

Column Generation (CG) is an iterative algorithm for solving linear programs (LPs) with an extremely large number of variables (columns). CG is the workhorse for tackling large-scale \textit{integer} linear programs, which rely on CG to solve LP relaxations within a branch and price algorithm. Two canonical applications are the Cutting Stock Problem (CSP) and Vehicle Routing Problem with Time Windows (VRPTW). In VRPTW, for example, each binary variable represents the decision to include or exclude a \textit{route}, of which there are exponentially many; CG incrementally grows the subset of columns being used, ultimately converging to an optimal solution. We propose RLCG, the first Reinforcement Learning (RL) approach for CG. Unlike typical column selection rules which myopically select a column based on local information at each iteration, we treat CG as a sequential decision-making problem: the column selected in a given iteration affects subsequent column selections. This perspective lends itself to a Deep Reinforcement Learning approach that uses Graph Neural Networks (GNNs) to represent the variable-constraint structure in the LP of interest. We perform an extensive set of experiments using the publicly available BPPLIB benchmark for CSP and Solomon benchmark for VRPTW. RLCG converges faster and reduces the number of CG iterations by 22.4\% for CSP and 40.9\% for VRPTW on average compared to a commonly used greedy policy. Our code is available at https://github.com/chichengmessi/reinforcement-learning-for-column-generation.git.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes