OCJun 17, 2023
A Survey of Contextual Optimization Methods for Decision Making under UncertaintyUtsav Sadana, Abhilash Chenreddy, Erick Delage et al.
Recently there has been a surge of interest in operations research (OR) and the machine learning (ML) community in combining prediction algorithms and optimization techniques to solve decision-making problems in the face of uncertainty. This gave rise to the field of contextual optimization, under which data-driven procedures are developed to prescribe actions to the decision-maker that make the best use of the most recently updated information. A large variety of models and methods have been presented in both OR and ML literature under a variety of names, including data-driven optimization, prescriptive optimization, predictive stochastic programming, policy optimization, (smart) predict/estimate-then-optimize, decision-focused learning, (task-based) end-to-end learning/forecasting/optimization, etc. Focusing on single and two-stage stochastic programming problems, this review article identifies three main frameworks for learning policies from data and discusses their strengths and limitations. We present the existing models and methods under a uniform notation and terminology and classify them according to the three main frameworks identified. Our objective with this survey is to both strengthen the general understanding of this active field of research and stimulate further theoretical and algorithmic advancements in integrating ML and stochastic programming.
OCMay 2, 2022
Fast Continuous and Integer L-shaped Heuristics Through Supervised LearningEric Larsen, Emma Frejinger, Bernard Gendron et al.
We propose a methodology at the nexus of operations research and machine learning (ML) leveraging generic approximators available from ML to accelerate the solution of mixed-integer linear two-stage stochastic programs. We aim at solving problems where the second stage is highly demanding. Our core idea is to gain large reductions in online solution time while incurring small reductions in first-stage solution accuracy by substituting the exact second-stage solutions with fast, yet accurate supervised ML predictions. This upfront investment in ML would be justified when similar problems are solved repeatedly over time, for example, in transport planning related to fleet management, routing and container yard management. Our numerical results focus on the problem class seminally addressed with the integer and continuous L-shaped cuts. Our extensive empirical analysis is grounded in standardized families of problems derived from stochastic server location (SSLP) and stochastic multi knapsack (SMKP) problems available in the literature. The proposed method can solve the hardest instances of SSLP in less than 9% of the time it takes the state-of-the-art exact method, and in the case of SMKP the same figure is 20%. Average optimality gaps are in most cases less than 0.1%.
MEOct 25, 2022
Arc travel time and path choice model estimation subsumedSobhan Mohammadpour, Emma Frejinger
We address the problem of simultaneously estimating arc travel times in a network \emph{and} parameters of route choice models for strategic and tactical network planning purposes. Hitherto, these interdependent tasks have been approached separately in the literature on road traffic networks. We illustrate that ignoring this interdependence can lead to erroneous route choice model parameter estimates. We propose a method for maximum likelihood estimation to solve the simultaneous estimation problem that is applicable to any differentiable route choice model. Moreover, our approach allows to naturally mix observations at varying levels of granularity, including noisy or partial path data. Numerical results based on real taxi data from New York City show strong performance of our method, even in comparison to a benchmark method focused solely on arc travel time estimation.
44.5LGMar 17
Contextual Preference Distribution LearningBenjamin Hudson, Laurent Charlin, Emma Frejinger
Decision-making problems often feature uncertainty stemming from heterogeneous and context-dependent human preferences. To address this, we propose a sequential learning-and-optimization pipeline to learn preference distributions and leverage them to solve downstream problems, for example risk-averse formulations. We focus on human choice settings that can be formulated as (integer) linear programs. In such settings, existing inverse optimization and choice modelling methods infer preferences from observed choices but typically produce point estimates or fail to capture contextual shifts, making them unsuitable for risk-averse decision-making. Using a bounded-variance score function gradient estimator, we train a predictive model mapping contextual features to a rich class of parameterizable distributions. This approach yields a maximum likelihood estimate. The model generates scenarios for unseen contexts in the subsequent optimization phase. In a synthetic ridesharing environment, our approach reduces average post-decision surprise by up to 114$\times$ compared to a risk-neutral approach with perfect predictions and up to 25$\times$ compared to leading risk-averse baselines.
OCMay 5, 2023Code
Scope Restriction for Scalable Real-Time Railway Rescheduling: An Exploratory StudyErik Nygren, Christian Eichenberger, Emma Frejinger
With the aim to stimulate future research, we describe an exploratory study of a railway rescheduling problem. A widely used approach in practice and state of the art is to decompose these complex problems by geographical scope. Instead, we propose defining a core problem that restricts a rescheduling problem in response to a disturbance to only trains that need to be rescheduled, hence restricting the scope in both time and space. In this context, the difficulty resides in defining a scoper that can predict a subset of train services that will be affected by a given disturbance. We report preliminary results using the Flatland simulation environment that highlights the potential and challenges of this idea. We provide an extensible playground open-source implementation based on the Flatland railway environment and Answer-Set Programming.
LGDec 21, 2023
Maximum entropy GFlowNets with soft Q-learningSobhan Mohammadpour, Emmanuel Bengio, Emma Frejinger et al.
Generative Flow Networks (GFNs) have emerged as a powerful tool for sampling discrete objects from unnormalized distributions, offering a scalable alternative to Markov Chain Monte Carlo (MCMC) methods. While GFNs draw inspiration from maximum entropy reinforcement learning (RL), the connection between the two has largely been unclear and seemingly applicable only in specific cases. This paper addresses the connection by constructing an appropriate reward function, thereby establishing an exact relationship between GFNs and maximum entropy RL. This construction allows us to introduce maximum entropy GFNs, which, in contrast to GFNs with uniform backward policy, achieve the maximum entropy attainable by GFNs without constraints on the state space.
LGJun 10, 2024
Decoupling regularization from the action spaceSobhan Mohammadpour, Emma Frejinger, Pierre-Luc Bacon
Regularized reinforcement learning (RL), particularly the entropy-regularized kind, has gained traction in optimal control and inverse RL. While standard unregularized RL methods remain unaffected by changes in the number of actions, we show that it can severely impact their regularized counterparts. This paper demonstrates the importance of decoupling the regularizer from the action space: that is, to maintain a consistent level of regularization regardless of how many actions are involved to avoid over-regularization. Whereas the problem can be avoided by introducing a task-specific temperature parameter, it is often undesirable and cannot solve the problem when action spaces are state-dependent. In the state-dependent action context, different states with varying action spaces are regularized inconsistently. We introduce two solutions: a static temperature selection approach and a dynamic counterpart, universally applicable where this problem arises. Implementing these changes improves performance on the DeepMind control suite in static and dynamic temperature regimes and a biological sequence design task.
OCFeb 7, 2022
Minimizing Entropy to Discover Good Solutions to Recurrent Mixed Integer ProgramsCharly Robinson La Rocca, Emma Frejinger, Jean-François Cordeau
Current state-of-the-art solvers for mixed-integer programming (MIP) problems are designed to perform well on a wide range of problems. However, for many real-world use cases, problem instances come from a narrow distribution. This has motivated the development of specialized methods that can exploit the information in historical datasets to guide the design of heuristics. Recent works have shown that machine learning (ML) can be integrated with an MIP solver to inject domain knowledge and efficiently close the optimality gap. This hybridization is usually done with deep learning (DL), which requires a large dataset and extensive hyperparameter tuning to perform well. This paper proposes an online heuristic that uses the notion of entropy to efficiently build a model with minimal training data and tuning. We test our method on the locomotive assignment problem (LAP), a recurring real-world problem that is challenging to solve at scale. Experimental results show a speed up of an order of magnitude compared to a general purpose solver (CPLEX) with a relative gap of less than 2%. We also observe that for some instances our method can discover better solutions than CPLEX within the time limit.
LGMay 19, 2021
Periodic Freight Demand Estimation for Large-scale Tactical PlanningGreta Laage, Emma Frejinger, Gilles Savard
Freight carriers rely on tactical planning to design their service network to satisfy demand in a cost-effective way. For computational tractability, deterministic and cyclic Service Network Design (SND) formulations are used to solve large-scale problems. A central input is the periodic demand, that is, the demand expected to repeat in every period in the planning horizon. In practice, demand is predicted by a time series forecasting model and the periodic demand is the average of those forecasts. This is, however, only one of many possible mappings. The problem consisting in selecting this mapping has hitherto been overlooked in the literature. We propose to use the structure of the downstream decision-making problem to select a good mapping. For this purpose, we introduce a multilevel mathematical programming formulation that explicitly links the time series forecasts to the SND problem of interest. The solution is a periodic demand estimate that minimizes costs over the tactical planning horizon. We report results in an extensive empirical study of a large-scale application from the Canadian National Railway Company. They clearly show the importance of the periodic demand estimation problem. Indeed, the planning costs exhibit an important variation over different periodic demand estimates and using an estimate different from the mean forecast can lead to substantial cost reductions. Moreover, the costs associated with the periodic demand estimates based on forecasts were comparable to, or even better than those obtained using the mean of actual demand.
OCJan 29, 2021
Reinforcement Learning for Freight Booking Control ProblemsJustin Dumouchelle, Emma Frejinger, Andrea Lodi
Booking control problems are sequential decision-making problems that occur in the domain of revenue management. More precisely, freight booking control focuses on the problem of deciding to accept or reject bookings: given a limited capacity, accept a booking request or reject it to reserve capacity for future bookings with potentially higher revenue. This problem can be formulated as a finite-horizon stochastic dynamic program, where accepting a set of requests results in a profit at the end of the booking period that depends on the cost of fulfilling the accepted bookings. For many freight applications, the cost of fulfilling requests is obtained by solving an operational decision-making problem, which often requires the solutions to mixed-integer linear programs. Routinely solving such operational problems when deploying reinforcement learning algorithms may be too time consuming. The majority of booking control policies are obtained by solving problem-specific mathematical programming relaxations that are often non-trivial to generalize to new problems and, in some cases, provide quite crude approximations. In this work, we propose a two-phase approach: we first train a supervised learning model to predict the objective of the operational problem, and then we deploy the model within reinforcement learning algorithms to compute control policies. This approach is general: it can be used every time the objective function of the end-of-horizon operational problem can be predicted, and it is particularly suitable to those cases where such problems are computationally hard. Furthermore, it allows one to leverage the recent advances in reinforcement learning as routinely solving the operational problem is replaced with a single prediction. Our methodology is evaluated on two booking control problems in the literature, namely, distributional logistics and airline cargo management.
LGJan 13, 2021
Assessing the Impact: Does an Improvement to a Revenue Management System Lead to an Improved Revenue?Greta Laage, Emma Frejinger, Andrea Lodi et al.
Airlines and other industries have been making use of sophisticated Revenue Management Systems to maximize revenue for decades. While improving the different components of these systems has been the focus of numerous studies, estimating the impact of such improvements on the revenue has been overlooked in the literature despite its practical importance. Indeed, quantifying the benefit of a change in a system serves as support for investment decisions. This is a challenging problem as it corresponds to the difference between the generated value and the value that would have been generated keeping the system as before. The latter is not observable. Moreover, the expected impact can be small in relative value. In this paper, we cast the problem as counterfactual prediction of unobserved revenue. The impact on revenue is then the difference between the observed and the estimated revenue. The originality of this work lies in the innovative application of econometric methods proposed for macroeconomic applications to a new problem setting. Broadly applicable, the approach benefits from only requiring revenue data observed for origin-destination pairs in the network of the airline at each day, before and after a change in the system is applied. We report results using real large-scale data from Air Canada. We compare a deep neural network counterfactual predictions model with econometric models. They achieve respectively 1% and 1.1% of error on the counterfactual revenue predictions, and allow to accurately estimate small impacts (in the order of 2%).
OCDec 17, 2019
A learning-based algorithm to quickly compute good primal solutions for Stochastic Integer ProgramsYoshua Bengio, Emma Frejinger, Andrea Lodi et al.
We propose a novel approach using supervised learning to obtain near-optimal primal solutions for two-stage stochastic integer programming (2SIP) problems with constraints in the first and second stages. The goal of the algorithm is to predict a "representative scenario" (RS) for the problem such that, deterministically solving the 2SIP with the random realization equal to the RS, gives a near-optimal solution to the original 2SIP. Predicting an RS, instead of directly predicting a solution ensures first-stage feasibility of the solution. If the problem is known to have complete recourse, second-stage feasibility is also guaranteed. For computational testing, we learn to find an RS for a two-stage stochastic facility location problem with integer variables and linear constraints in both stages and consistently provide near-optimal solutions. Our computing times are very competitive with those of general-purpose integer programming solvers to achieve a similar solution quality.
LGOct 18, 2019
A language processing algorithm for predicting tactical solutions to an operational planning problem under uncertaintyEmma Frejinger, Eric Larsen
This paper is devoted to the prediction of solutions to a stochastic discrete optimization problem. Through an application, we illustrate how we can use a state-of-the-art neural machine translation (NMT) algorithm to predict the solutions by defining appropriate vocabularies, syntaxes and constraints. We attend to applications where the predictions need to be computed in very short computing time -- in the order of milliseconds or less. The results show that with minimal adaptations to the model architecture and hyperparameter tuning, the NMT algorithm can produce accurate solutions within the computing time budget. While these predictions are slightly less accurate than approximate stochastic programming solutions (sample average approximation), they can be computed faster and with less variability.
MLMay 2, 2019
A tutorial on recursive models for analyzing and predicting path choice behaviorMaëlle Zimmermann, Emma Frejinger
The problem at the heart of this tutorial consists in modeling the path choice behavior of network users. This problem has been extensively studied in transportation science, where it is known as the route choice problem. In this literature, individuals' choice of paths are typically predicted using discrete choice models. This article is a tutorial on a specific category of discrete choice models called recursive, and it makes three main contributions: First, for the purpose of assisting future research on route choice, we provide a comprehensive background on the problem, linking it to different fields including inverse optimization and inverse reinforcement learning. Second, we formally introduce the problem and the recursive modeling idea along with an overview of existing models, their properties and applications. Third, we extensively analyze illustrative examples from different angles so that a novice reader can gain intuition on the problem and the advantages provided by recursive models in comparison to path-based ones.
LGJan 22, 2019
Predicting Tactical Solutions to Operational Planning Problems under Imperfect InformationEric Larsen, Sébastien Lachapelle, Yoshua Bengio et al.
This paper offers a methodological contribution at the intersection of machine learning and operations research. Namely, we propose a methodology to quickly predict tactical solutions to a given operational problem. In this context, the tactical solution is less detailed than the operational one but it has to be computed in very short time and under imperfect information. The problem is of importance in various applications where tactical and operational planning problems are interrelated and information about the operational problem is revealed over time. This is for instance the case in certain capacity planning and demand management systems. We formulate the problem as a two-stage optimal prediction stochastic program whose solution we predict with a supervised machine learning algorithm. The training data set consists of a large number of deterministic (second stage) problems generated by controlled probabilistic sampling. The labels are computed based on solutions to the deterministic problems (solved independently and offline) employing appropriate aggregation and subselection methods to address uncertainty. Results on our motivating application in load planning for rail transportation show that deep learning algorithms produce highly accurate predictions in very short computing time (milliseconds or less). The prediction accuracy is comparable to solutions computed by sample average approximation of the stochastic program.
LGJul 31, 2018
Predicting Tactical Solutions to Operational Planning Problems under Imperfect InformationEric Larsen, Sébastien Lachapelle, Yoshua Bengio et al.
This paper offers a methodological contribution at the intersection of machine learning and operations research. Namely, we propose a methodology to quickly predict expected tactical descriptions of operational solutions (TDOSs). The problem we address occurs in the context of two-stage stochastic programming where the second stage is demanding computationally. We aim to predict at a high speed the expected TDOS associated with the second stage problem, conditionally on the first stage variables. This may be used in support of the solution to the overall two-stage problem by avoiding the online generation of multiple second stage scenarios and solutions. We formulate the tactical prediction problem as a stochastic optimal prediction program, whose solution we approximate with supervised machine learning. The training dataset consists of a large number of deterministic operational problems generated by controlled probabilistic sampling. The labels are computed based on solutions to these problems (solved independently and offline), employing appropriate aggregation and subselection methods to address uncertainty. Results on our motivating application on load planning for rail transportation show that deep learning models produce accurate predictions in very short computing time (milliseconds or less). The predictive accuracy is close to the lower bounds calculated based on sample average approximation of the stochastic prediction programs.