MLDec 9, 2025
Heuristics for Combinatorial Optimization via Value-based Reinforcement Learning: A Unified Framework and AnalysisOrit Davidovich, Shimrit Shtern, Segev Wasserkrug et al.
Since the 1990s, considerable empirical work has been carried out to train statistical models, such as neural networks (NNs), as learned heuristics for combinatorial optimization (CO) problems. When successful, such an approach eliminates the need for experts to design heuristics per problem type. Due to their structure, many hard CO problems are amenable to treatment through reinforcement learning (RL). Indeed, we find a wealth of literature training NNs using value-based, policy gradient, or actor-critic approaches, with promising results, both in terms of empirical optimality gaps and inference runtimes. Nevertheless, there has been a paucity of theoretical work undergirding the use of RL for CO problems. To this end, we introduce a unified framework to model CO problems through Markov decision processes (MDPs) and solve them using RL techniques. We provide easy-to-test assumptions under which CO problems can be formulated as equivalent undiscounted MDPs that provide optimal solutions to the original CO problems. Moreover, we establish conditions under which value-based RL techniques converge to approximate solutions of the CO problem with a guarantee on the associated optimality gap. Our convergence analysis provides: (1) a sufficient rate of increase in batch size and projected gradient descent steps at each RL iteration; (2) the resulting optimality gap in terms of problem parameters and targeted RL accuracy; and (3) the importance of a choice of state-space embedding. Together, our analysis illuminates the success (and limitations) of the celebrated deep Q-learning algorithm in this problem context.
GTJul 8, 2022
Online Learning in Supply-Chain GamesNicolò Cesa-Bianchi, Tommaso Cesari, Takayuki Osogami et al.
We study a repeated game between a supplier and a retailer who want to maximize their respective profits without full knowledge of the problem parameters. After characterizing the uniqueness of the Stackelberg equilibrium of the stage game with complete information, we show that even with partial knowledge of the joint distribution of demand and production costs, natural learning dynamics guarantee convergence of the joint strategy profile of supplier and retailer to the Stackelberg equilibrium of the stage game. We also prove finite-time bounds on the supplier's regret and asymptotic bounds on the retailer's regret, where the specific rates depend on the type of knowledge preliminarily available to the players. In the special case when the supplier is not strategic (vertical integration), we prove optimal finite-time regret bounds on the retailer's regret (or, equivalently, the social welfare) when costs and demand are adversarially generated and the demand is censored.
LGNov 3, 2025
Finding Probably Approximate Optimal Solutions by Training to Estimate the Optimal Values of SubproblemsNimrod Megiddo, Segev Wasserkrug, Orit Davidovich et al.
The paper is about developing a solver for maximizing a real-valued function of binary variables. The solver relies on an algorithm that estimates the optimal objective-function value of instances from the underlying distribution of objectives and their respective sub-instances. The training of the estimator is based on an inequality that facilitates the use of the expected total deviation from optimality conditions as a loss function rather than the objective-function itself. Thus, it does not calculate values of policies, nor does it rely on solved instances.
AIFeb 26, 2024
From Large Language Models and Optimization to Decision Optimization CoPilot: A Research ManifestoSegev Wasserkrug, Leonard Boussioux, Dick den Hertog et al.
Significantly simplifying the creation of optimization models for real-world business problems has long been a major goal in applying mathematical optimization more widely to important business and societal decisions. The recent capabilities of Large Language Models (LLMs) present a timely opportunity to achieve this goal. Therefore, we propose research at the intersection of LLMs and optimization to create a Decision Optimization CoPilot (DOCP) - an AI tool designed to assist any decision maker, interacting in natural language to grasp the business problem, subsequently formulating and solving the corresponding optimization model. This paper outlines our DOCP vision and identifies several fundamental requirements for its implementation. We describe the state of the art through a literature survey and experiments using ChatGPT. We show that a) LLMs already provide substantial novel capabilities relevant to a DOCP, and b) major research challenges remain to be addressed. We also propose possible research directions to overcome these gaps. We also see this work as a call to action to bring together the LLM and optimization communities to pursue our vision, thereby enabling much more widespread improved decision-making.
AINov 20, 2025
An Agent-Based Framework for the Automatic Validation of Mathematical Optimization ModelsAlexander Zadorojniy, Segev Wasserkrug, Eitan Farchi
Recently, using Large Language Models (LLMs) to generate optimization models from natural language descriptions has became increasingly popular. However, a major open question is how to validate that the generated models are correct and satisfy the requirements defined in the natural language description. In this work, we propose a novel agent-based method for automatic validation of optimization models that builds upon and extends methods from software testing to address optimization modeling . This method consists of several agents that initially generate a problem-level testing API, then generate tests utilizing this API, and, lastly, generate mutations specific to the optimization model (a well-known software testing technique assessing the fault detection power of the test suite). In this work, we detail this validation framework and show, through experiments, the high quality of validation provided by this agent ensemble in terms of the well-known software testing measure called mutation coverage.
GTNov 30, 2024
Achieving PAC Guarantees in Mechanism Design through Multi-Armed BanditsTakayuki Osogami, Hirota Kinoshita, Segev Wasserkrug
We analytically derive a class of optimal solutions to a linear program (LP) for automated mechanism design that satisfies efficiency, incentive compatibility, strong budget balance (SBB), and individual rationality (IR), where SBB and IR are enforced in expectation. These solutions can be expressed using a set of essential variables whose cardinality is exponentially smaller than the total number of variables in the original formulation. However, evaluating a key term in the solutions requires exponentially many optimization steps as the number of players $N$ increases. We address this by translating the evaluation of this term into a multi-armed bandit (MAB) problem and develop a probably approximately correct (PAC) estimator with asymptotically optimal sample complexity. This MAB-based approach reduces the optimization complexity from exponential to $O(N\log N)$. Numerical experiments confirm that our method efficiently computes mechanisms with the target properties, scaling to problems with up to $N=128$ players -- substantially improving over prior work.
AIJul 4, 2012
A Model for Reasoning with Uncertain Rules in Event Composition SystemsSegev Wasserkrug, Avigdor Gal, Opher Etzion
In recent years, there has been an increased need for the use of active systems - systems required to act automatically based on events, or changes in the environment. Such systems span many areas, from active databases to applications that drive the core business processes of today's enterprises. However, in many cases, the events to which the system must respond are not generated by monitoring tools, but must be inferred from other events based on complex temporal predicates. In addition, in many applications, such inference is inherently uncertain. In this paper, we introduce a formal framework for knowledge representation and reasoning enabling such event inference. Based on probability theory, we define the representation of the associated uncertainty. In addition, we formally define the probability space, and show how the relevant probabilities can be calculated by dynamically constructing a Bayesian network. To the best of our knowledge, this is the first work that enables taking such uncertainty into account in the context of active systems. herefore, our contribution is twofold: We formally define the representation and semantics of event composition for probabilistic settings, and show how to apply these extensions to the quantification of the occurrence probability of events. These results enable any active system to handle such uncertainty.