ML LGMay 18

Provably Data-driven Lagrangian Relaxation for Mixed Integer Linear Programming

Tung Quoc Le, Anh Tuan Nguyen, Viet Anh Nguyen

arXiv:2605.1905235.7

Predicted impact top 48% in ML · last 90 daysOriginality Incremental advance

AI Analysis

For researchers in combinatorial optimization and ML, this work bridges a theoretical gap in data-driven Lagrangian relaxation, though the results are incremental as they confirm expected rates.

This paper provides the first theoretical analysis of learning Lagrangian multipliers for MILP, deriving generalization bounds and proving that stochastic gradient ascent achieves minimax-optimal rates, with an extension to warm-starting that yields faster convergence.

Lagrangian Relaxation (LR) is a powerful technique for solving large-scale Mixed Integer Linear Programming (MILP), particularly those with decomposable structures, such as vehicle routing or unit commitment problems. By relaxing the coupling constraints, LR enables parallel subproblem solving and often yields tighter dual bounds than standard linear programming relaxations, which is crucial for efficient branch-and-bound pruning. While recent empirical work has shown promising results using machine learning to predict these multipliers, a theoretical understanding of such methods remains an open question. In this work, we bridge this gap by analyzing the problem of learning LR through the lens of Data-driven Algorithm Design, i.e., a statistical learning problem over a distribution of problem instances. Our contributions are as follows: first, we derive a generalization bound of $\mathcal{O}(s^{1.5}/\sqrt{N})$ for the learned multipliers, where $s$ is the number of coupling constraints and $N$ is the sample size. Second, we provide a minimax lower-bound of $Ω(s/\sqrt{N})$, proving that a linear dependency is unavoidable. Third, we constructively close this theoretical gap by proving that Stochastic Gradient Ascent (SGA) with averaging achieves the minimax optimal rate $Θ(s/\sqrt{N})$. Finally, we extend our framework to the learning-to-warm-start setting, proving that it achieves a fast, minimax-optimal rate of $Θ(s/N)$ and establishing a theoretical advantage over direct multiplier prediction.

View on arXiv PDF

Similar