AI LGJun 3

Beyond Objective Equivalence: Constraint Injection for LLM-Based Optimization Modeling on Vehicle Routing Problems

Xizi Luo, Changhong He, Dongdong Geng, Chenggong Shi, Yu Mei

arXiv:2606.0481627.8

AI Analysis

For operations research practitioners using LLMs to generate optimization code, this work addresses the critical problem of silent constraint errors, providing a verification method and model that significantly improves reliability.

LLMs often produce incorrect solver code for constraint-dense optimization problems because existing verification methods miss spurious or omitted constraints. The authors propose constraint injection, a dual verifier that detects such errors, and build VRPCoder, an 8B model achieving 93% Pass@1 on VRP benchmarks, outperforming Gemini-3.1-Pro Preview and Claude-Sonnet-4.5 by 28 points.

Large language models (LLMs) increasingly translate natural-language optimization problems into executable solver code. Yet for constraint-dense operations research (OR) problems, existing data-filtering and training pipelines largely rely on objective-equivalence signals such as differential testing and answer agreement, which a program can pass while adding spurious constraints or silently omitting required ones, whenever those constraints are non-binding on the tested instance. We propose constraint injection, which uses feasible probes to expose spurious over-constraint and one-constraint-violating probes to reveal silent constraint omission. Combined with differential testing, it forms a dual verifier. We instantiate and evaluate it on vehicle routing problems (VRPs), a representative constraint-dense combinatorial optimization testbed with coupled operational constraints. We develop VRPCoder, an 8B end-to-end model that translates natural-language VRP scenarios into Gurobi scripts, together with an expert-verified VRP benchmark suite covering 21 variants. The verifier is reused as a rejection-sampling filter during data synthesis and as a per-rollout reward in group relative policy optimization (GRPO). Across four VRP benchmarks, VRPCoder-GRPO reaches 93\% average Pass@1, outperforms Gemini-3.1-Pro Preview on three benchmarks, exceeds Claude-Sonnet-4.5 by 28 average points, and surpasses prior OR-LLMs by 78 average points.

View on arXiv PDF

Similar