Generating Robust Portfolios of Optimization Models using Large Language Models
For practitioners needing reliable optimization models from natural language, this work provides a principled method to mitigate LLM unreliability, though it is an incremental improvement over single-model generation.
The paper addresses the unreliability of LLM-generated optimization models by proposing a portfolio generation algorithm that uses an LLM as both a stochastic generator and a reasoning evaluator, with theoretical guarantees that the portfolio contains high-quality candidates if either role is well-aligned. Empirical results show strong performance across multiple optimization modeling tasks.
Mathematical optimization is a powerful tool for structured decision-making across domains such as resource allocation and planning. Formulating optimization models faithful to reality, though, remains a significant bottleneck as it typically demands both domain expertise and optimization knowledge that are often scarce. Recent advances in large language models (LLMs) promise to bridge this gap, enabling the generation of candidate optimization models from natural language descriptions. However, there is no guarantee that any single LLM-generated model is reliable, and existing approaches that output only one model are therefore risky. In this work, we propose a novel algorithm that generates a portfolio of optimization models, designed to be robust to the limitations of LLMs. Our method exploits the observation that a single LLM can play two distinct roles $\unicode{x2014}$ as a stochastic generator and as a reasoning evaluator $\unicode{x2014}$ and proposes a unified framework that leverages both capabilities in a complementary manner. We provide theoretical guarantees showing that, as long as either the generator or the evaluator is well-aligned with human preferences, the portfolio is guaranteed to contain high-quality candidates, enabling a principled human-in-the-loop process in which a decision-maker can review multiple candidates before committing to one. We further validate our approach empirically, demonstrating strong performance across a range of optimization modeling tasks.