AILGSYOCDec 11, 2025

CORL: Reinforcement Learning of MILP Policies Solved via Branch and Bound

arXiv:2512.11169v1
Originality Incremental advance
AI Analysis

This work addresses the challenge of modeling stochastic real-world problems with MILPs for decision quality, though it is incremental as a proof of concept.

The authors tackled the problem of suboptimal performance in combinatorial sequential decision making by introducing CORL, a framework that fine-tunes mixed integer linear program (MILP) policies using reinforcement learning on real-world data, validated in a simple illustrative example.

Combinatorial sequential decision making problems are typically modeled as mixed integer linear programs (MILPs) and solved via branch and bound (B&B) algorithms. The inherent difficulty of modeling MILPs that accurately represent stochastic real world problems leads to suboptimal performance in the real world. Recently, machine learning methods have been applied to build MILP models for decision quality rather than how accurately they model the real world problem. However, these approaches typically rely on supervised learning, assume access to true optimal decisions, and use surrogates for the MILP gradients. In this work, we introduce a proof of concept CORL framework that end to end fine tunes an MILP scheme using reinforcement learning (RL) on real world data to maximize its operational performance. We enable this by casting an MILP solved by B&B as a differentiable stochastic policy compatible with RL. We validate the CORL method in a simple illustrative combinatorial sequential decision making example.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes