MLFeb 5
Decision-Focused Sequential Experimental Design: A Directional Uncertainty-Guided ApproachBeichen Wan, Mo Liu, Paul Grigas et al.
We consider the sequential experimental design problem in the predict-then-optimize paradigm. In this paradigm, the outputs of the prediction model are used as coefficient vectors in a downstream linear optimization problem. Traditional sequential experimental design aims to control the input variables (features) so that the improvement in prediction accuracy from each experimental outcome (label) is maximized. However, in the predict-then-optimize setting, performance is ultimately evaluated based on the decision loss induced by the downstream optimization, rather than by prediction error. This mismatch between prediction accuracy and decision loss renders traditional decision-blind designs inefficient. To address this issue, we propose a directional-based metric to quantify predictive uncertainty. This metric does not require solving an optimization oracle and is therefore computationally tractable. We show that the resulting sequential design criterion enjoys strong consistency and convergence guarantees. Under a broad class of distributions, we demonstrate that our directional uncertainty-based design attains an earlier stopping time than decision-blind designs. This advantage is further supported by real-world experiments on an LLM job allocation problem.
94.1DSApr 1
Asymptotically Optimal Sequential Testing with Heterogeneous LLMsGuokai Li, Jiaxin, Liang et al.
We study a Bayesian binary sequential hypothesis testing problem with multiple large language models (LLMs). Each LLM $j$ has per-query cost $c_j>0$, random waiting time with mean $μ_j>0$ and sub-Gaussian tails, and \emph{asymmetric} accuracies: the probability of returning the correct label depends on the true hypothesis $θ\in\{A,B\}$ and needs not be the same under $A$ and $B$. This asymmetry induces two distinct information rates $(I_{j,A}, I_{j,B})$ per LLM, one under each hypothesis. The decision-maker chooses LLMs sequentially, observes their noisy binary answers, and stops when the posterior probability of one hypothesis exceeds $1-α$. The objective is to minimize the sum of expected query cost and expected waiting cost, $\mathbb{E}[C_Ï] + \mathbb{E}[g(W_Ï)]$, where $C_Ï$ is the total query cost, $W_Ï$ is the total waiting time and $g$ is a polynomial function (e.g., $g(x)=x^Ï$ with $Ï\ge 1$). We prove that as the error tolerance $α\to0$, the optimal policy is asymptotically equivalent to one that uses at most two LLMs. In this case, a single-LLM policy is \emph{not} generically optimal: optimality now requires exploiting a two-dimensional tradeoff between information under $A$ and information under $B$. Any admissible policy induces an expected information-allocation vector in $\mathbb{R}_+^2$, and we show that the optimal allocation lies at an extreme point of the associated convex set when $α$ is relatively small, and hence uses at most two LLMs. We construct belief-dependent policies that first mix between two LLMs when the posterior is ambiguous, and then switch to a single ``specialist'' LLM when the posterior is sufficiently close to one of the hypotheses. These policies match the universal lower bound up to a $(1+o(1))$ factor as $α\rightarrow 0$.
LGMay 11, 2023
Active Learning For Contextual Linear Optimization: A Margin-Based ApproachMo Liu, Paul Grigas, Heyuan Liu et al.
We develop the first active learning method for contextual linear optimization. Specifically, we introduce a label acquisition algorithm that sequentially decides whether to request the ``labels'' of feature samples from an unlabeled data stream, where the labels correspond to the coefficients of the objective in the linear optimization. Our method is the first to be directly informed by the decision loss induced by the predicted coefficients, referred to as the Smart Predict-then-Optimize (SPO) loss. Motivated by the structure of the SPO loss, our algorithm adopts a margin-based criterion utilizing the concept of distance to degeneracy. In particular, we design an efficient active learning algorithm with theoretical excess risk (i.e., generalization) guarantees. We derive upper bounds on the label complexity, defined as the number of samples whose labels are acquired to achieve a desired small level of SPO risk. These bounds show that our algorithm has a much smaller label complexity than the naive supervised learning approach that labels all samples, particularly when the SPO loss is minimized directly on the collected data. To address the discontinuity and nonconvexity of the SPO loss, we derive label complexity bounds under tractable surrogate loss functions. Under natural margin conditions, these bounds also outperform naive supervised learning. Using the SPO+ loss, a specialized surrogate of the SPO loss, we establish even tighter bounds under separability conditions. Finally, we present numerical evidence showing the practical value of our algorithms in settings such as personalized pricing and the shortest path problem.