OCApr 15, 2022
Structural Analysis of Branch-and-Cut and the Learnability of Gomory Mixed Integer CutsMaria-Florina Balcan, Siddharth Prasad, Tuomas Sandholm et al.
The incorporation of cutting planes within the branch-and-bound algorithm, known as branch-and-cut, forms the backbone of modern integer programming solvers. These solvers are the foremost method for solving discrete optimization problems and thus have a vast array of applications in machine learning, operations research, and many other fields. Choosing cutting planes effectively is a major research topic in the theory and practice of integer programming. We conduct a novel structural analysis of branch-and-cut that pins down how every step of the algorithm is affected by changes in the parameters defining the cutting planes added to the input integer program. Our main application of this analysis is to derive sample complexity guarantees for using machine learning to determine which cutting planes to apply during branch-and-cut. These guarantees apply to infinite families of cutting planes, such as the family of Gomory mixed integer cuts, which are responsible for the main breakthrough speedups of integer programming solvers. We exploit geometric and combinatorial structure of branch-and-cut in our analysis, which provides a key missing piece for the recent generalization theory of branch-and-cut.
GTFeb 20, 2023
Leveraging Reviews: Learning to Price with Buyer and Seller UncertaintyWenshuo Guo, Nika Haghtalab, Kirthevasan Kandasamy et al.
In online marketplaces, customers have access to hundreds of reviews for a single product. Buyers often use reviews from other customers that share their type -- such as height for clothing, skin type for skincare products, and location for outdoor furniture -- to estimate their values, which they may not know a priori. Customers with few relevant reviews may hesitate to make a purchase except at a low price, so for the seller, there is a tension between setting high prices and ensuring that there are enough reviews so that buyers can confidently estimate their values. Simultaneously, sellers may use reviews to gauge the demand for items they wish to sell. In this work, we study this pricing problem in an online setting where the seller interacts with a set of buyers of finitely many types, one by one, over a series of $T$ rounds. At each round, the seller first sets a price. Then a buyer arrives and examines the reviews of the previous buyers with the same type, which reveal those buyers' ex-post values. Based on the reviews, the buyer decides to purchase if they have good reason to believe that their ex-ante utility is positive. Crucially, the seller does not know the buyer's type when setting the price, nor even the distribution over types. We provide a no-regret algorithm that the seller can use to obtain high revenue. When there are $d$ types, after $T$ rounds, our algorithm achieves a problem-independent $\tilde O(T^{2/3}d^{1/3})$ regret bound. However, when the smallest probability $q_{\text{min}}$ that any given type appears is large, specifically when $q_{\text{min}} \in Ω(d^{-2/3}T^{-1/3})$, then the same algorithm achieves a $\tilde O(T^{1/2}q_{\text{min}}^{-1/2})$ regret bound. We complement these upper bounds with matching lower bounds in both regimes, showing that our algorithm is minimax optimal up to lower-order terms.
AIFeb 20, 2025Code
EquivaMap: Leveraging LLMs for Automatic Equivalence Checking of Optimization FormulationsHaotian Zhai, Connor Lawless, Ellen Vitercik et al.
A fundamental problem in combinatorial optimization is identifying equivalent formulations. Despite the growing need for automated equivalence checks -- driven, for example, by optimization copilots, which generate problem formulations from natural language descriptions -- current approaches rely on simple heuristics that fail to reliably check formulation equivalence. Inspired by Karp reductions, in this work we introduce Quasi-Karp equivalence, a formal criterion for determining when two optimization formulations are equivalent based on the existence of a mapping between their decision variables. We propose EquivaMap, a framework that leverages large language models to automatically discover such mappings for scalable, reliable equivalence checking, with a verification stage that ensures mapped solutions preserve feasibility and optimality without additional solver calls. To evaluate our approach, we construct EquivaFormulation, the first open-source dataset of equivalent optimization formulations, generated by applying transformations such as adding slack variables or valid inequalities to existing formulations. Empirically, EquivaMap significantly outperforms existing methods, achieving substantial improvements in correctly identifying formulation equivalence.
72.5DSApr 21
Smoothed Analysis of Online Metric Matching with a Single Sample: Beyond Metric DistortionYingxi Li, Ellen Vitercik, Mingwei Yang
In the online metric matching problem, $n$ servers and $n$ requests lie in a metric space. Servers are available upfront, and requests arrive sequentially. An arriving request must be matched immediately and irrevocably to an available server, incurring a cost equal to their distance. The goal is to minimize the total matching cost. We study this problem in the Euclidean metric $[0, 1]^d$, when servers are adversarial and requests are independently drawn from distinct distributions that satisfy a mild smoothness condition. Our main result is an $O(1)$-competitive algorithm for $d \neq 2$ that requires no distributional knowledge, relying only on a single sample from each request distribution. To our knowledge, this is the first algorithm to achieve an $o(\log n)$ competitive ratio for non-trivial metrics beyond the i.i.d. setting. Our approach bypasses the $Ω(\log n)$ barrier introduced by probabilistic metric embeddings: instead of analyzing the embedding distortion and the algorithm separately, we directly bound the cost of the algorithm on the target metric of a simple deterministic embedding. We then combine this analysis with lower bounds on the offline optimum for Euclidean metrics, derived via majorization arguments, to obtain our guarantees.
LGDec 16, 2024
LLMs for Cold-Start Cutting Plane Separator ConfigurationConnor Lawless, Yingxi Li, Anders Wikum et al.
Mixed integer linear programming (MILP) solvers expose hundreds of parameters that have an outsized impact on performance but are difficult to configure for all but expert users. Existing machine learning (ML) approaches require training on thousands of related instances, generalize poorly and can be difficult to integrate into existing solver workflows. We propose a large language model (LLM)-based framework that configures cutting plane separators using problem descriptions and solver-specific separator summaries. To reduce variance in LLM outputs, we introduce an ensembling strategy that clusters and aggregates candidate configurations into a small portfolio of high-performing configurations. Our method requires no custom solver interface, generates configurations in seconds via simple API calls, and requires solving only a small number of instances. Extensive experiments on standard synthetic and real-world MILPs show our approach matches or outperforms state-of-the-art configuration methods with a fraction of the data and computation.
MLFeb 5, 2025
Algorithms with Calibrated Machine Learning PredictionsJudy Hanwen Shen, Ellen Vitercik, Anders Wikum · stanford
The field of algorithms with predictions incorporates machine learning advice in the design of online algorithms to improve real-world performance. A central consideration is the extent to which predictions can be trusted -- while existing approaches often require users to specify an aggregate trust level, modern machine learning models can provide estimates of prediction-level uncertainty. In this paper, we propose calibration as a principled and practical tool to bridge this gap, demonstrating the benefits of calibrated advice through two case studies: the ski rental and online job scheduling problems. For ski rental, we design an algorithm that achieves near-optimal prediction-dependent performance and prove that, in high-variance settings, calibrated advice offers more effective guidance than alternative methods for uncertainty quantification. For job scheduling, we demonstrate that using a calibrated predictor leads to significant performance improvements over existing methods. Evaluations on real-world data validate our theoretical findings, highlighting the practical impact of calibration for algorithms with predictions.
LGFeb 23, 2025
Subsampling Graphs with GNN Performance GuaranteesMika Sarkin Jain, Stefanie Jegelka, Ishani Karmarkar et al.
How can we subsample graph data so that a graph neural network (GNN) trained on the subsample achieves performance comparable to training on the full dataset? This question is of fundamental interest, as smaller datasets reduce labeling costs, storage requirements, and computational resources needed for training. Selecting an effective subset is challenging: a poorly chosen subsample can severely degrade model performance, and empirically testing multiple subsets for quality obviates the benefits of subsampling. Therefore, it is critical that subsampling comes with guarantees on model performance. In this work, we introduce new subsampling methods for graph datasets that leverage the Tree Mover's Distance to reduce both the number of graphs and the size of individual graphs. To our knowledge, our approach is the first that is supported by rigorous theoretical guarantees: we prove that training a GNN on the subsampled data results in a bounded increase in loss compared to training on the full dataset. Unlike existing methods, our approach is both model-agnostic, requiring minimal assumptions about the GNN architecture, and label-agnostic, eliminating the need to label the full training set. This enables subsampling early in the model development pipeline (before data annotation, model selection, and hyperparameter tuning) reducing costs and resources needed for storage, labeling, and training. We validate our theoretical results with experiments showing that our approach outperforms existing subsampling methods across multiple datasets.
LGMay 29, 2025
Primal-Dual Neural Algorithmic ReasoningYu He, Ellen Vitercik
Neural Algorithmic Reasoning (NAR) trains neural networks to simulate classical algorithms, enabling structured and interpretable reasoning over complex data. While prior research has predominantly focused on learning exact algorithms for polynomial-time-solvable problems, extending NAR to harder problems remains an open challenge. In this work, we introduce a general NAR framework grounded in the primal-dual paradigm, a classical method for designing efficient approximation algorithms. By leveraging a bipartite representation between primal and dual variables, we establish an alignment between primal-dual algorithms and Graph Neural Networks. Furthermore, we incorporate optimal solutions from small instances to greatly enhance the model's reasoning capabilities. Our empirical results demonstrate that our model not only simulates but also outperforms approximation algorithms for multiple tasks, exhibiting robust generalization to larger and out-of-distribution graphs. Moreover, we highlight the framework's practical utility by integrating it with commercial solvers and applying it to real-world datasets.
LGMay 29, 2025
Can LLMs Reason Structurally? An Evaluation via the Lens of Data StructuresYu He, Yingxi Li, Colin White et al.
As large language models (LLMs) take on increasingly complex tasks, understanding their algorithmic reasoning abilities has become essential. However, existing evaluations focus on distinct and isolated tasks. We propose a unified diagnostic lens: structural reasoning--understanding and manipulating relationships like order, hierarchy, and connectivity. We introduce DSR-Bench, the first benchmark to systematically evaluate LLM structural reasoning through canonical data structures, which serve as interpretable, algorithmically meaningful abstractions. DSR-Bench spans 20 data structures, 35 operations, and 4,140 synthetically generated problem instances with minimal contamination. The benchmark's hierarchical design pinpoints specific failure modes, while its fully automated evaluation ensures objective and consistent assessment. Benchmarking ten state-of-the-art LLMs reveals critical limitations: the top-performing model scores only 0.498 out of 1 on challenging instances. Three additional evaluation suites reveal further weaknesses: models perform poorly on spatial data and natural language scenarios, and fail to reason over their own generated code. DSR-Bench offers a principled diagnostic tool for structural reasoning, helping expose reasoning bottlenecks and guide the development of more capable and reliable LLMs.
MLDec 12, 2024
Wait-Less Offline Tuning and Re-solving for Online Decision MakingJingruo Sun, Wenzhi Gao, Ellen Vitercik et al.
Online linear programming (OLP) has found broad applications in revenue management and resource allocation. State-of-the-art OLP algorithms achieve low regret by repeatedly solving linear programming (LP) subproblems that incorporate updated resource information. However, LP-based methods are computationally expensive and often inefficient for large-scale applications. In contrast, recent first-order OLP algorithms are more computationally efficient but typically suffer from worse regret guarantees. To address these shortcomings, we propose a new algorithm that combines the strengths of LP-based and first-order OLP methods. The algorithm re-solves the LP subproblems periodically at a predefined frequency $f$ and uses the latest dual prices to guide online decision-making. In addition, a first-order method runs in parallel during each interval between LP re-solves, smoothing resource consumption. Our algorithm achieves $\mathscr{O}(\log (T/f) + \sqrt{f})$ regret, delivering a "wait-less" online decision-making process that balances the computational efficiency of first-order methods and the superior regret guarantee of LP-based methods.
LGFeb 22, 2024
From Large to Small Datasets: Size Generalization for Clustering Algorithm SelectionVaggos Chatziafratis, Ishani Karmarkar, Ellen Vitercik
In clustering algorithm selection, we are given a massive dataset and must efficiently select which clustering algorithm to use. We study this problem in a semi-supervised setting, with an unknown ground-truth clustering that we can only access through expensive oracle queries. Ideally, the clustering algorithm's output will be structurally close to the ground truth. We approach this problem by introducing a notion of size generalization for clustering algorithm accuracy. We identify conditions under which we can (1) subsample the massive clustering instance, (2) evaluate a set of candidate algorithms on the smaller instance, and (3) guarantee that the algorithm with the best accuracy on the small instance will have the best accuracy on the original big instance. We provide theoretical size generalization guarantees for three classic clustering algorithms: single-linkage, k-means++, and (a smoothed variant of) Gonzalez's k-centers heuristic. We validate our theoretical analysis with empirical results, observing that on real-world clustering instances, we can use a subsample of as little as 5% of the data to identify which algorithm is best on the full dataset.
LGOct 8, 2025
Bridged Clustering for Representation Learning: Semi-Supervised Sparse BridgingPatrick Peixuan Ye, Chen Shani, Ellen Vitercik
We introduce Bridged Clustering, a semi-supervised framework to learn predictors from any unpaired input $X$ and output $Y$ dataset. Our method first clusters $X$ and $Y$ independently, then learns a sparse, interpretable bridge between clusters using only a few paired examples. At inference, a new input $x$ is assigned to its nearest input cluster, and the centroid of the linked output cluster is returned as the prediction $\hat{y}$. Unlike traditional SSL, Bridged Clustering explicitly leverages output-only data, and unlike dense transport-based methods, it maintains a sparse and interpretable alignment. Through theoretical analysis, we show that with bounded mis-clustering and mis-bridging rates, our algorithm becomes an effective and efficient predictor. Empirically, our method is competitive with SOTA methods while remaining simple, model-agnostic, and highly label-efficient in low-supervision settings.
LGOct 17, 2024
Algorithmic Content Selection and the Impact of User DisengagementEmilio Calvano, Nika Haghtalab, Ellen Vitercik et al.
Digital services face a fundamental trade-off in content selection: they must balance the immediate revenue gained from high-reward content against the long-term benefits of maintaining user engagement. Traditional multi-armed bandit models assume that users remain perpetually engaged, failing to capture the possibility that users may disengage when dissatisfied, thereby reducing future revenue potential. In this work, we introduce a model for the content selection problem that explicitly accounts for variable user engagement and disengagement. In our framework, content that maximizes immediate reward is not necessarily optimal in terms of fostering sustained user engagement. Our contributions are twofold. First, we develop computational and statistical methods for offline optimization and online learning of content selection policies. For users whose engagement patterns are defined by $k$ distinct levels, we design a dynamic programming algorithm that computes the exact optimal policy in $O(k^2)$ time. Moreover, we derive no-regret learning guarantees for an online learning setting in which the platform serves a series of users with unknown and potentially adversarial engagement patterns. Second, we introduce the concept of modified demand elasticity which captures how small changes in a user's overall satisfaction affect the platform's ability to secure long-term revenue. This notion generalizes classical demand elasticity by incorporating the dynamics of user re-engagement, thereby revealing key insights into the interplay between engagement and revenue. Notably, our analysis uncovers a counterintuitive phenomenon: although higher friction (i.e., a reduced likelihood of re-engagement) typically lowers overall revenue, it can simultaneously lead to higher user engagement under optimal content selection policies.
LGJun 10, 2024
MAGNOLIA: Matching Algorithms via GNNs for Online Value-to-go ApproximationAlexandre Hayderi, Amin Saberi, Ellen Vitercik et al.
Online Bayesian bipartite matching is a central problem in digital marketplaces and exchanges, including advertising, crowdsourcing, ridesharing, and kidney exchange. We introduce a graph neural network (GNN) approach that emulates the problem's combinatorially-complex optimal online algorithm, which selects actions (e.g., which nodes to match) by computing each action's value-to-go (VTG) -- the expected weight of the final matching if the algorithm takes that action, then acts optimally in the future. We train a GNN to estimate VTG and show empirically that this GNN returns high-weight matchings across a variety of tasks. Moreover, we identify a common family of graph distributions in spatial crowdsourcing applications, such as rideshare, under which VTG can be efficiently approximated by aggregating information within local neighborhoods in the graphs. This structure matches the local behavior of GNNs, providing theoretical justification for our approach.
LGMar 3, 2024
Bandit Profit-maximization for Targeted MarketingJoon Suk Huh, Ellen Vitercik, Kirthevasan Kandasamy
We study a sequential profit-maximization problem, optimizing for both price and ancillary variables like marketing expenditures. Specifically, we aim to maximize profit over an arbitrary sequence of multiple demand curves, each dependent on a distinct ancillary variable, but sharing the same price. A prototypical example is targeted marketing, where a firm (seller) wishes to sell a product over multiple markets. The firm may invest different marketing expenditures for different markets to optimize customer acquisition, but must maintain the same price across all markets. Moreover, markets may have heterogeneous demand curves, each responding to prices and marketing expenditures differently. The firm's objective is to maximize its gross profit, the total revenue minus marketing costs. Our results are near-optimal algorithms for this class of problems in an adversarial bandit setting, where demand curves are arbitrary non-adaptive sequences, and the firm observes only noisy evaluations of chosen points on the demand curves. For $n$ demand curves (markets), we prove a regret upper bound of $\tilde{O}(nT^{3/4})$ and a lower bound of $Ω((nT)^{3/4})$ for monotonic demand curves, and a regret bound of $\tildeΘ(nT^{2/3})$ for demands curves that are monotonic in price and concave in the ancillary variables.
CYMay 23, 2023
Disincentivizing Polarization in Social NetworksChristian Borgs, Jennifer Chayes, Christian Ikeokwu et al.
On social networks, algorithmic personalization drives users into filter bubbles where they rarely see content that deviates from their interests. We present a model for content curation and personalization that avoids filter bubbles, along with algorithmic guarantees and nearly matching lower bounds. In our model, the platform interacts with $n$ users over $T$ timesteps, choosing content for each user from $k$ categories. The platform receives stochastic rewards as in a multi-arm bandit. To avoid filter bubbles, we draw on the intuition that if some users are shown some category of content, then all users should see at least a small amount of that content. We first analyze a naive formalization of this intuition and show it has unintended consequences: it leads to ``tyranny of the majority'' with the burden of diversification borne disproportionately by those with minority interests. This leads us to our model which distributes this burden more equitably. We require that the probability any user is shown a particular type of content is at least $γ$ times the average probability all users are shown that type of content. Full personalization corresponds to $γ= 0$ and complete homogenization corresponds to $γ= 1$; hence, $γ$ encodes a hard cap on the level of personalization. We also analyze additional formulations where the platform can exceed its cap but pays a penalty proportional to its constraint violation. We provide algorithmic guarantees for optimizing recommendations subject to these constraints. These include nearly matching upper and lower bounds for the entire range of $γ\in [0,1]$ showing that the reward of a multi-agent variant of UCB is nearly optimal. Using real-world preference data, we empirically verify that under our model, users share the burden of diversification with only minor utility loss under our constraints.
LGFeb 22, 2022
No-Regret Learning in Partially-Informed AuctionsWenshuo Guo, Michael I. Jordan, Ellen Vitercik
Auctions with partially-revealed information about items are broadly employed in real-world applications, but the underlying mechanisms have limited theoretical support. In this work, we study a machine learning formulation of these types of mechanisms, presenting algorithms that are no-regret from the buyer's perspective. Specifically, a buyer who wishes to maximize his utility interacts repeatedly with a platform over a series of $T$ rounds. In each round, a new item is drawn from an unknown distribution and the platform publishes a price together with incomplete, "masked" information about the item. The buyer then decides whether to purchase the item. We formalize this problem as an online learning task where the goal is to have low regret with respect to a myopic oracle that has perfect knowledge of the distribution over items and the seller's masking function. When the distribution over items is known to the buyer and the mask is a SimHash function mapping $\mathbb{R}^d$ to $\{0,1\}^{\ell}$, our algorithm has regret $\tilde O((Td\ell)^{1/2})$. In a fully agnostic setting when the mask is an arbitrary function mapping to a set of size $n$ and the prices are stochastic, our algorithm has regret $\tilde O((Tn)^{1/2})$.
LGNov 18, 2021
Improved Sample Complexity Bounds for Branch-and-CutMaria-Florina Balcan, Siddharth Prasad, Tuomas Sandholm et al.
Branch-and-cut is the most widely used algorithm for solving integer programs, employed by commercial solvers like CPLEX and Gurobi. Branch-and-cut has a wide variety of tunable parameters that have a huge impact on the size of the search tree that it builds, but are challenging to tune by hand. An increasingly popular approach is to use machine learning to tune these parameters: using a training set of integer programs from the application domain at hand, the goal is to find a configuration with strong predicted performance on future, unseen integer programs from the same domain. If the training set is too small, a configuration may have good performance over the training set but poor performance on future integer programs. In this paper, we prove sample complexity guarantees for this procedure, which bound how large the training set should be to ensure that for any configuration, its average performance over the training set is close to its expected future performance. Our guarantees apply to parameters that control the most important aspects of branch-and-cut: node selection, branching constraint selection, and cutting plane selection, and are sharper and more general than those found in prior research.
AIJun 8, 2021
Sample Complexity of Tree Search Configuration: Cutting Planes and BeyondMaria-Florina Balcan, Siddharth Prasad, Tuomas Sandholm et al.
Cutting-plane methods have enabled remarkable successes in integer programming over the last few decades. State-of-the-art solvers integrate a myriad of cutting-plane techniques to speed up the underlying tree-search algorithm used to find optimal solutions. In this paper we prove the first guarantees for learning high-performing cut-selection policies tailored to the instance distribution at hand using samples. We first bound the sample complexity of learning cutting planes from the canonical family of Chvátal-Gomory cuts. Our bounds handle any number of waves of any number of cuts and are fine tuned to the magnitudes of the constraint coefficients. Next, we prove sample complexity bounds for more sophisticated cut selection policies that use a combination of scoring rules to choose from a family of cuts. Finally, beyond the realm of cutting planes for integer programming, we develop a general abstraction of tree search that captures key components such as node selection and variable selection. For this abstraction, we bound the sample complexity of learning a good policy for building the search tree.
AIDec 24, 2020
Generalization in portfolio-based algorithm selectionMaria-Florina Balcan, Tuomas Sandholm, Ellen Vitercik
Portfolio-based algorithm selection has seen tremendous practical success over the past two decades. This algorithm configuration procedure works by first selecting a portfolio of diverse algorithm parameter settings, and then, on a given problem instance, using an algorithm selector to choose a parameter setting from the portfolio with strong predicted performance. Oftentimes, both the portfolio and the algorithm selector are chosen using a training set of typical problem instances from the application domain at hand. In this paper, we provide the first provable guarantees for portfolio-based algorithm selection. We analyze how large the training set should be to ensure that the resulting algorithm selector's average performance over the training set is close to its future (expected) performance. This involves analyzing three key reasons why these two quantities may diverge: 1) the learning-theoretic complexity of the algorithm selector, 2) the size of the portfolio, and 3) the learning-theoretic complexity of the algorithm's performance as a function of its parameters. We introduce an end-to-end learning-theoretic analysis of the portfolio construction and algorithm selection together. We prove that if the portfolio is large, overfitting is inevitable, even with an extremely simple algorithm selector. With experiments, we illustrate a tradeoff exposed by our theoretical analysis: as we increase the portfolio size, we can hope to include a well-suited parameter setting for every possible problem instance, but it becomes impossible to avoid overfitting.
LGJul 2, 2020
Private Optimization Without Constraint ViolationsAndrés Muñoz Medina, Umar Syed, Sergei Vassilvitskii et al.
We study the problem of differentially private optimization with linear constraints when the right-hand-side of the constraints depends on private data. This type of problem appears in many applications, especially resource allocation. Previous research provided solutions that retained privacy but sometimes violated the constraints. In many settings, however, the constraints cannot be violated under any circumstances. To address this hard requirement, we present an algorithm that releases a nearly-optimal solution satisfying the constraints with probability 1. We also prove a lower bound demonstrating that the difference between the objective value of our algorithm's solution and the optimal solution is tight up to logarithmic factors among all differentially private algorithms. We conclude with experiments demonstrating that our algorithm can achieve nearly optimal performance while preserving privacy.
AIJun 21, 2020
Refined bounds for algorithm configuration: The knife-edge of dual class approximabilityMaria-Florina Balcan, Tuomas Sandholm, Ellen Vitercik
Automating algorithm configuration is growing increasingly necessary as algorithms come with more and more tunable parameters. It is common to tune parameters using machine learning, optimizing performance metrics such as runtime and solution quality. The training set consists of problem instances from the specific domain at hand. We investigate a fundamental question about these techniques: how large should the training set be to ensure that a parameter's average empirical performance over the training set is close to its expected, future performance? We answer this question for algorithm configuration problems that exhibit a widely-applicable structure: the algorithm's performance as a function of its parameters can be approximated by a "simple" function. We show that if this approximation holds under the L-infinity norm, we can provide strong sample complexity bounds. On the flip side, if the approximation holds only under the L-p norm for p smaller than infinity, it is not possible to provide meaningful sample complexity bounds in the worst case. We empirically evaluate our bounds in the context of integer programming, one of the most powerful tools in computer science. Via experiments, we obtain sample complexity bounds that are up to 700 times smaller than the previously best-known bounds.
LGAug 8, 2019
How much data is sufficient to learn high-performing algorithms? Generalization guarantees for data-driven algorithm designMaria-Florina Balcan, Dan DeBlasio, Travis Dick et al.
Algorithms often have tunable parameters that impact performance metrics such as runtime and solution quality. For many algorithms used in practice, no parameter settings admit meaningful worst-case bounds, so the parameters are made available for the user to tune. Alternatively, parameters may be tuned implicitly within the proof of a worst-case approximation ratio or runtime bound. Worst-case instances, however, may be rare or nonexistent in practice. A growing body of research has demonstrated that data-driven algorithm design can lead to significant improvements in performance. This approach uses a training set of problem instances sampled from an unknown, application-specific distribution and returns a parameter setting with strong average performance on the training set. We provide a broadly applicable theory for deriving generalization guarantees that bound the difference between the algorithm's average performance over the training set and its expected performance. Our results apply no matter how the parameters are tuned, be it via an automated or manual approach. The challenge is that for many types of algorithms, performance is a volatile function of the parameters: slightly perturbing the parameters can cause large changes in behavior. Prior research has proved generalization bounds by employing case-by-case analyses of greedy algorithms, clustering algorithms, integer programming algorithms, and selling mechanisms. We uncover a unifying structure which we use to prove extremely general guarantees, yet we recover the bounds from prior research. Our guarantees apply whenever an algorithm's performance is a piecewise-constant, -linear, or -- more generally -- piecewise-structured function of its parameters. Our theory also implies novel bounds for voting mechanisms and dynamic programming algorithms from computational biology.
LGMay 26, 2019
Learning to Optimize Computational Resources: Frugal Training with Generalization GuaranteesMaria-Florina Balcan, Tuomas Sandholm, Ellen Vitercik
Algorithms typically come with tunable parameters that have a considerable impact on the computational resources they consume. Too often, practitioners must hand-tune the parameters, a tedious and error-prone task. A recent line of research provides algorithms that return nearly-optimal parameters from within a finite set. These algorithms can be used when the parameter space is infinite by providing as input a random sample of parameters. This data-independent discretization, however, might miss pockets of nearly-optimal parameters: prior research has presented scenarios where the only viable parameters lie within an arbitrarily small region. We provide an algorithm that learns a finite set of promising parameters from within an infinite set. Our algorithm can help compile a configuration portfolio, or it can be used to select the input to a configuration algorithm for finite parameter spaces. Our approach applies to any configuration problem that satisfies a simple yet ubiquitous structure: the algorithm's performance is a piecewise constant function of its parameters. Prior research has exhibited this structure in domains from integer programming to clustering.
LGApr 26, 2019
Learning to Prune: Speeding up Repeated ComputationsDaniel Alabi, Adam Tauman Kalai, Katrina Ligett et al.
It is common to encounter situations where one must solve a sequence of similar computational problems. Running a standard algorithm with worst-case runtime guarantees on each instance will fail to take advantage of valuable structure shared across the problem instances. For example, when a commuter drives from work to home, there are typically only a handful of routes that will ever be the shortest path. A naive algorithm that does not exploit this common structure may spend most of its time checking roads that will never be in the shortest path. More generally, we can often ignore large swaths of the search space that will likely never contain an optimal solution. We present an algorithm that learns to maximally prune the search space on repeated computations, thereby reducing runtime while provably outputting the correct solution each period with high probability. Our algorithm employs a simple explore-exploit technique resembling those used in online algorithms, though our setting is quite different. We prove that, with respect to our model of pruning search spaces, our approach is optimal up to constant factors. Finally, we illustrate the applicability of our model and algorithm to three classic problems: shortest-path routing, string search, and linear programming. We present experiments confirming that our simple algorithm is effective at significantly reducing the runtime of solving repeated computations.
AIMar 27, 2018
Learning to BranchMaria-Florina Balcan, Travis Dick, Tuomas Sandholm et al.
Tree search algorithms, such as branch-and-bound, are the most widely used tools for solving combinatorial and nonconvex problems. For example, they are the foremost method for solving (mixed) integer programs and constraint satisfaction problems. Tree search algorithms recursively partition the search space to find an optimal solution. In order to keep the tree size small, it is crucial to carefully decide, when expanding a tree node, which question (typically variable) to branch on at that node in order to partition the remaining space. Numerous partitioning techniques (e.g., variable selection) have been proposed, but there is no theory describing which technique is optimal. We show how to use machine learning to determine an optimal weighting of any set of partitioning procedures for the instance distribution at hand using samples from the distribution. We provide the first sample complexity guarantees for tree search algorithm configuration. These guarantees bound the number of samples sufficient to ensure that the empirical performance of an algorithm over the samples nearly matches its expected performance on the unknown instance distribution. This thorough theoretical investigation naturally gives rise to our learning algorithm. Via experiments, we show that learning an optimal weighting of partitioning procedures can dramatically reduce tree size, and we prove that this reduction can even be exponential. Through theory and experiments, we show that learning to branch is both practical and hugely beneficial.
LGNov 8, 2017
Dispersion for Data-Driven Algorithm Design, Online Learning, and Private OptimizationMaria-Florina Balcan, Travis Dick, Ellen Vitercik
Data-driven algorithm design, that is, choosing the best algorithm for a specific application, is a crucial problem in modern data science. Practitioners often optimize over a parameterized algorithm family, tuning parameters based on problems from their domain. These procedures have historically come with no guarantees, though a recent line of work studies algorithm selection from a theoretical perspective. We advance the foundations of this field in several directions: we analyze online algorithm selection, where problems arrive one-by-one and the goal is to minimize regret, and private algorithm selection, where the goal is to find good parameters over a set of problems without revealing sensitive information contained therein. We study important algorithm families, including SDP-rounding schemes for problems formulated as integer quadratic programs, and greedy techniques for canonical subset selection problems. In these cases, the algorithm's performance is a volatile and piecewise Lipschitz function of its parameters, since tweaking the parameters can completely change the algorithm's behavior. We give a sufficient and general condition, dispersion, defining a family of piecewise Lipschitz functions that can be optimized online and privately, which includes the functions measuring the performance of the algorithms we study. Intuitively, a set of piecewise Lipschitz functions is dispersed if no small region contains many of the functions' discontinuities. We present general techniques for online and private optimization of the sum of dispersed piecewise Lipschitz functions. We improve over the best-known regret bounds for a variety of problems, prove regret bounds for problems not previously studied, and give matching lower bounds. We also give matching upper and lower bounds on the utility loss due to privacy. Moreover, we uncover dispersion in auction design and pricing problems.
LGApr 29, 2017
Generalization Guarantees for Multi-item Profit Maximization: Pricing, Auctions, and Randomized MechanismsMaria-Florina Balcan, Tuomas Sandholm, Ellen Vitercik
We study multi-item profit maximization when there is an underlying distribution over buyers' values. In practice, a full description of the distribution is typically unavailable, so we study the setting where the mechanism designer only has samples from the distribution. If the designer uses the samples to optimize over a complex mechanism class -- such as the set of all multi-item, multi-buyer mechanisms -- a mechanism may have high average profit over the samples but low expected profit. This raises the central question of this paper: how many samples are sufficient to ensure that a mechanism's average profit is close to its expected profit? To answer this question, we uncover structure shared by many pricing, auction, and lottery mechanisms: for any set of buyers' values, profit is piecewise linear in the mechanism's parameters. Using this structure, we prove new bounds for mechanism classes not yet studied in the sample-based mechanism design literature and match or improve over the best-known guarantees for many classes.
DSNov 14, 2016
Learning-Theoretic Foundations of Algorithm Configuration for Combinatorial Partitioning ProblemsMaria-Florina Balcan, Vaishnavh Nagarajan, Ellen Vitercik et al.
Max-cut, clustering, and many other partitioning problems that are of significant importance to machine learning and other scientific fields are NP-hard, a reality that has motivated researchers to develop a wealth of approximation algorithms and heuristics. Although the best algorithm to use typically depends on the specific application domain, a worst-case analysis is often used to compare algorithms. This may be misleading if worst-case instances occur infrequently, and thus there is a demand for optimization methods which return the algorithm configuration best suited for the given application's typical inputs. We address this problem for clustering, max-cut, and other partitioning problems, such as integer quadratic programming, by designing computationally efficient and sample efficient learning algorithms which receive samples from an application-specific distribution over problem instances and learn a partitioning algorithm with high expected performance. Our algorithms learn over common integer quadratic programming and clustering algorithm families: SDP rounding algorithms and agglomerative clustering algorithms with dynamic programming. For our sample complexity analysis, we provide tight bounds on the pseudodimension of these algorithm classes, and show that surprisingly, even for classes of algorithms parameterized by a single parameter, the pseudo-dimension is superconstant. In this way, our work both contributes to the foundations of algorithm configuration and pushes the boundaries of learning theory, since the algorithm classes we analyze consist of multi-stage optimization procedures and are significantly more complex than classes typically studied in learning theory.
LGJun 13, 2016
Sample Complexity of Automated Mechanism DesignMaria-Florina Balcan, Tuomas Sandholm, Ellen Vitercik
The design of revenue-maximizing combinatorial auctions, i.e. multi-item auctions over bundles of goods, is one of the most fundamental problems in computational economics, unsolved even for two bidders and two items for sale. In the traditional economic models, it is assumed that the bidders' valuations are drawn from an underlying distribution and that the auction designer has perfect knowledge of this distribution. Despite this strong and oftentimes unrealistic assumption, it is remarkable that the revenue-maximizing combinatorial auction remains unknown. In recent years, automated mechanism design has emerged as one of the most practical and promising approaches to designing high-revenue combinatorial auctions. The most scalable automated mechanism design algorithms take as input samples from the bidders' valuation distribution and then search for a high-revenue auction in a rich auction class. In this work, we provide the first sample complexity analysis for the standard hierarchy of deterministic combinatorial auction classes used in automated mechanism design. In particular, we provide tight sample complexity bounds on the number of samples needed to guarantee that the empirical revenue of the designed mechanism on the samples is close to its expected revenue on the underlying, unknown distribution over bidder valuations, for each of the auction classes in the hierarchy. In addition to helping set automated mechanism design on firm foundations, our results also push the boundaries of learning theory. In particular, the hypothesis functions used in our contexts are defined through multi-stage combinatorial optimization procedures, rather than simple decision boundaries, as are common in machine learning.
LGMay 30, 2016
Learning Combinatorial Functions from Pairwise ComparisonsMaria-Florina Balcan, Ellen Vitercik, Colin White
A large body of work in machine learning has focused on the problem of learning a close approximation to an underlying combinatorial function, given a small set of labeled examples. However, for real-valued functions, cardinal labels might not be accessible, or it may be difficult for an expert to consistently assign real-valued labels over the entire set of examples. For instance, it is notoriously hard for consumers to reliably assign values to bundles of merchandise. Instead, it might be much easier for a consumer to report which of two bundles she likes better. With this motivation in mind, we consider an alternative learning model, wherein the algorithm must learn the underlying function up to pairwise comparisons, from pairwise comparisons. In this model, we present a series of novel algorithms that learn over a wide variety of combinatorial function classes. These range from graph functions to broad classes of valuation functions that are fundamentally important in microeconomic theory, the analysis of social networks, and machine learning, such as coverage, submodular, XOS, and subadditive functions, as well as functions with sparse Fourier support.