Haihao Lu

OC
h-index7
18papers
573citations
Novelty51%
AI Score46

18 Papers

IRJul 10, 2023
Online Ad Procurement in Non-stationary Autobidding Worlds

Jason Cheuk Nam Liang, Haihao Lu, Baoyu Zhou

Today's online advertisers procure digital ad impressions through interacting with autobidding platforms: advertisers convey high level procurement goals via setting levers such as budget, target return-on-investment, max cost per click, etc.. Then ads platforms subsequently procure impressions on advertisers' behalf, and report final procurement conversions (e.g. click) to advertisers. In practice, advertisers may receive minimal information on platforms' procurement details, and procurement outcomes are subject to non-stationary factors like seasonal patterns, occasional system corruptions, and market trends which make it difficult for advertisers to optimize lever decisions effectively. Motivated by this, we present an online learning framework that helps advertisers dynamically optimize ad platform lever decisions while subject to general long-term constraints in a realistic bandit feedback environment with non-stationary procurement outcomes. In particular, we introduce a primal-dual algorithm for online decision making with multi-dimension decision variables, bandit feedback and long-term uncertain constraints. We show that our algorithm achieves low regret in many worlds when procurement outcomes are generated through procedures that are stochastic, adversarial, adversarially corrupted, periodic, and ergodic, respectively, without having to know which procedure is the ground truth. Finally, we emphasize that our proposed algorithm and theoretical results extend beyond the applications of online advertising.

OCApr 6
End-to-End Learning of Correlated Operating Reserve Requirements in Security-Constrained Economic Dispatch

Owen Shen, Hung-po Chao, Haihao Lu et al.

Operating reserve requirements in security-constrained economic dispatch (SCED) depend strongly on the assumed correlation structure of renewable forecast errors, yet that structure is usually specified exogenously rather than learned for the dispatch task itself. This paper formulates correlated reserve-set design as an end-to-end trainable robust optimization problem: choose the ellipsoidal uncertainty-set shape to minimize robust dispatch cost subject to a target coverage requirement. By profiling the coverage constraint into a shape-dependent radius, the original bilevel problem becomes a single-stage differentiable objective, and KKT/dual information from the SCED solve provides task gradients without differentiating through the solver. For unknown distributions, a four-way train/tune/calibrate/test split combines a smoothed quantile-sensitivity estimator for training with split conformal calibration for deployment, yielding finite-sample marginal coverage under exchangeability and a consistent gradient estimator for the smoothed objective. The same task gradient can also be passed upstream to context-dependent encoders, which we report as a secondary extension. The framework is evaluated on the IEEE~118-bus system with a coupled SCED formulation that includes inter-zone transfer constraints. The learned static ellipsoid reduces dispatch cost by about 4.8\% relative to the Sample Covariance baseline while maintaining empirical coverage above the target level.

OCDec 12, 2024Code
MPAX: Mathematical Programming in JAX

Haihao Lu, Zedong Peng, Jinwen Yang

This paper presents MPAX (Mathematical Programming in JAX), a versatile and efficient toolbox for integrating linear programming (LP) into machine learning workflows. MPAX implemented the state-of-the-art first-order methods, restarted average primal-dual hybrid gradient and reflected restarted Halpern primal-dual hybrid gradient, to solve LPs in JAX. This provides native support for hardware accelerations along with features like batch solving, auto-differentiation, and device parallelism. Extensive numerical experiments demonstrate the advantages of MPAX over existing solvers. The solver is available at https://github.com/MIT-Lu-Lab/MPAX.

OCApr 24
FlashFolio: A GPU-Accelerated Solver for Portfolio Optimization

Yilun Jiang, Haihao Lu, Zedong Peng et al.

We present FlashFolio, a GPU-accelerated solver for single-period and multi-period portfolio optimization with factor-based risk modeling, bid-offer spread costs, and nonlinear market impact. These models are widely used in portfolio construction and optimal execution, but become computationally challenging at large scale, especially in the multi-period setting. We benchmark FlashFolio against MOSEK on instances constructed from realistic market inputs. FlashFolio delivers consistent runtime improvements, achieving speedups of up to 12.9x in the single-period setting and 48x in the multi-period setting, while also exhibiting stronger robustness on challenging multi-period instances. Our results show that GPU-based optimization can help improve the practicality of large-scale portfolio optimization.

OCFeb 12, 2022
Analysis of Dual-Based PID Controllers through Convolutional Mirror Descent

Santiago R. Balseiro, Haihao Lu, Vahab Mirrokni et al.

Dual-based proportional-integral-derivative (PID) controllers are often employed in practice to solve online allocation problems with global constraints, such as budget pacing in online advertising. However, controllers are used in a heuristic fashion and come with no provable guarantees on their performance. This paper provides the first regret bounds on the performance of dual-based PID controllers for online allocation problems. We do so by first establishing a fundamental connection between dual-based PID controllers and a new first-order algorithm for online convex optimization called \emph{Convolutional Mirror Descent} (CMD), which updates iterates based on a weighted moving average of past gradients. CMD recovers, in a special case, online mirror descent with momentum and optimistic mirror descent. We establish sufficient conditions under which CMD attains low regret for general online convex optimization problems with adversarial inputs. We leverage this new result to give the first regret bound for dual-based PID controllers for online allocation problems. As a byproduct of our proofs, we provide the first regret bound for CMD for non-smooth convex optimization, which might be of independent interest.

OCNov 10, 2021
Nearly Optimal Linear Convergence of Stochastic Primal-Dual Methods for Linear Programming

Haihao Lu, Jinwen Yang

There is a recent interest on first-order methods for linear programming (LP). In this paper,we propose a stochastic algorithm using variance reduction and restarts for solving sharp primal-dual problems such as LP. We show that the proposed stochastic method exhibits a linear convergence rate for solving sharp instances with a high probability. In addition, we propose an efficient coordinate-based stochastic oracle for unconstrained bilinear problems, which has $\mathcal O(1)$ per iteration cost and improves the complexity of the existing deterministic and stochastic algorithms. Finally, we show that the obtained linear convergence rate is nearly optimal (upto $\log$ terms) for a wide class of stochastic primal dual methods.

DSNov 18, 2020
The Best of Many Worlds: Dual Mirror Descent for Online Allocation Problems

Santiago Balseiro, Haihao Lu, Vahab Mirrokni

Online allocation problems with resource constraints are central problems in revenue management and online advertising. In these problems, requests arrive sequentially during a finite horizon and, for each request, a decision maker needs to choose an action that consumes a certain amount of resources and generates reward. The objective is to maximize cumulative rewards subject to a constraint on the total consumption of resources. In this paper, we consider a data-driven setting in which the reward and resource consumption of each request are generated using an input model that is unknown to the decision maker. We design a general class of algorithms that attain good performance in various input models without knowing which type of input they are facing. In particular, our algorithms are asymptotically optimal under independent and identically distributed inputs as well as various non-stationary stochastic input models, and they attain an asymptotically optimal fixed competitive ratio when the input is adversarial. Our algorithms operate in the Lagrangian dual space: they maintain a dual multiplier for each resource that is updated using online mirror descent. By choosing the reference function accordingly, we recover the dual sub-gradient descent and dual multiplicative weights update algorithm. The resulting algorithms are simple, fast, and do not require convexity in the revenue function, consumption function and action space, in contrast to existing methods for online allocation problems. We discuss applications to network revenue management, online bidding in repeated auctions with budget constraints, online proportional matching with high entropy, and personalized assortment optimization with limited inventory.

OCOct 20, 2020
Limiting Behaviors of Nonconvex-Nonconcave Minimax Optimization via Continuous-Time Systems

Benjamin Grimmer, Haihao Lu, Pratik Worah et al.

Unlike nonconvex optimization, where gradient descent is guaranteed to converge to a local optimizer, algorithms for nonconvex-nonconcave minimax optimization can have topologically different solution paths: sometimes converging to a solution, sometimes never converging and instead following a limit cycle, and sometimes diverging. In this paper, we study the limiting behaviors of three classic minimax algorithms: gradient descent ascent (GDA), alternating gradient descent ascent (AGDA), and the extragradient method (EGM). Numerically, we observe that all of these limiting behaviors can arise in Generative Adversarial Networks (GAN) training and are easily demonstrated for a range of GAN problems. To explain these different behaviors, we study the high-order resolution continuous-time dynamics that correspond to each algorithm, which results in the sufficient (and almost necessary) conditions for the local convergence by each method. Moreover, this ODE perspective allows us to characterize the phase transition between these different limiting behaviors caused by introducing regularization as Hopf Bifurcations.

OCJul 1, 2020
Regularized Online Allocation Problems: Fairness and Beyond

Santiago Balseiro, Haihao Lu, Vahab Mirrokni

Online allocation problems with resource constraints have a rich history in operations research. In this paper, we introduce the \emph{regularized online allocation problem}, a variant that includes a non-linear regularizer acting on the total resource consumption. In this problem, requests repeatedly arrive over time and, for each request, a decision maker needs to take an action that generates a reward and consumes resources. The objective is to simultaneously maximize additively separable rewards and the value of a non-separable regularizer subject to the resource constraints. Our primary motivation is allowing decision makers to trade off separable objectives such as the economic efficiency of an allocation with ancillary, non-separable objectives such as the fairness or equity of an allocation. We design an algorithm that is simple, fast, and attains good performance with both stochastic i.i.d.~and adversarial inputs. In particular, our algorithm is asymptotically optimal under stochastic i.i.d. input models and attains a fixed competitive ratio that depends on the regularizer when the input is adversarial. Furthermore, the algorithm and analysis do not require convexity or concavity of the reward function and the consumption function, which allows more model flexibility. Numerical experiments confirm the effectiveness of the proposed algorithm and of regularization in an internet advertising application.

OCJun 15, 2020
The Landscape of the Proximal Point Method for Nonconvex-Nonconcave Minimax Optimization

Benjamin Grimmer, Haihao Lu, Pratik Worah et al.

Minimax optimization has become a central tool in machine learning with applications in robust optimization, reinforcement learning, GANs, etc. These applications are often nonconvex-nonconcave, but the existing theory is unable to identify and deal with the fundamental difficulties this poses. In this paper, we study the classic proximal point method (PPM) applied to nonconvex-nonconcave minimax problems. We find that a classic generalization of the Moreau envelope by Attouch and Wets provides key insights. Critically, we show this envelope not only smooths the objective but can convexify and concavify it based on the level of interaction present between the minimizing and maximizing variables. From this, we identify three distinct regions of nonconvex-nonconcave problems. When interaction is sufficiently strong, we derive global linear convergence guarantees. Conversely when the interaction is fairly weak, we derive local linear convergence guarantees with a proper initialization. Between these two settings, we show that PPM may diverge or converge to a limit cycle.

OCFeb 20, 2020
Contextual Reserve Price Optimization in Auctions via Mixed-Integer Programming

Joey Huchette, Haihao Lu, Hossein Esfandiari et al.

We study the problem of learning a linear model to set the reserve price in an auction, given contextual information, in order to maximize expected revenue from the seller side. First, we show that it is not possible to solve this problem in polynomial time unless the \emph{Exponential Time Hypothesis} fails. Second, we present a strong mixed-integer programming (MIP) formulation for this problem, which is capable of exactly modeling the nonconvex and discontinuous expected reward function. Moreover, we show that this MIP formulation is ideal (i.e. the strongest possible formulation) for the revenue function of a single impression. Since it can be computationally expensive to exactly solve the MIP formulation in practice, we also study the performance of its linear programming (LP) relaxation. Though it may work well in practice, we show that, unfortunately, in the worst case the optimal objective of the LP relaxation can be O(number of samples) times larger than the optimal objective of the true problem. Finally, we present computational results, showcasing that the MIP formulation, along with its LP relaxation, are able to achieve superior in- and out-of-sample performance, as compared to state-of-the-art algorithms on both real and synthetic datasets. More broadly, we believe this work offers an indication of the strength of optimization methodologies like MIP to exactly model intrinsic discontinuities in machine learning problems.

OCJan 23, 2020
An $O(s^r)$-Resolution ODE Framework for Understanding Discrete-Time Algorithms and Applications to the Linear Convergence of Minimax Problems

Haihao Lu

There has been a long history of using ordinary differential equations (ODEs) to understand the dynamics of discrete-time algorithms (DTAs). Surprisingly, there are still two fundamental and unanswered questions: (i) it is unclear how to obtain a \emph{suitable} ODE from a given DTA, and (ii) it is unclear the connection between the convergence of a DTA and its corresponding ODEs. In this paper, we propose a new machinery -- an $O(s^r)$-resolution ODE framework -- for analyzing the behavior of a generic DTA, which (partially) answers the above two questions. The framework contains three steps: 1. To obtain a suitable ODE from a given DTA, we define a hierarchy of $O(s^r)$-resolution ODEs of a DTA parameterized by the degree $r$, where $s$ is the step-size of the DTA. We present a principal approach to construct the unique $O(s^r)$-resolution ODEs from a DTA; 2. To analyze the resulting ODE, we propose the $O(s^r)$-linear-convergence condition of a DTA with respect to an energy function, under which the $O(s^r)$-resolution ODE converges linearly to an optimal solution; 3. To bridge the convergence properties of a DTA and its corresponding ODEs, we define the properness of an energy function and show that the linear convergence of the $O(s^r)$-resolution ODE with respect to a proper energy function can automatically guarantee the linear convergence of the DTA. To better illustrate this machinery, we utilize it to study three classic algorithms -- gradient descent ascent (GDA), proximal point method (PPM) and extra-gradient method (EGM) -- for solving the unconstrained minimax problem $\min_{x\in\RR^n} \max_{y\in \RR^m} L(x,y)$.

MLJul 9, 2019
Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization

Kenji Kawaguchi, Haihao Lu

We propose a new stochastic optimization framework for empirical risk minimization problems such as those that arise in machine learning. The traditional approaches, such as (mini-batch) stochastic gradient descent (SGD), utilize an unbiased gradient estimator of the empirical average loss. In contrast, we develop a computationally efficient method to construct a gradient estimator that is purposely biased toward those observations with higher current losses. On the theory side, we show that the proposed method minimizes a new ordered modification of the empirical average loss, and is guaranteed to converge at a sublinear rate to a global optimum for convex loss and to a critical point for weakly convex (non-convex) loss. Furthermore, we prove a new generalization bound for the proposed algorithm. On the empirical side, the numerical experiments show that our proposed method consistently improves the test errors compared with the standard mini-batch SGD in various models including SVM, logistic regression, and deep learning problems.

LGMar 20, 2019
Accelerating Gradient Boosting Machine

Haihao Lu, Sai Praneeth Karimireddy, Natalia Ponomareva et al.

Gradient Boosting Machine (GBM) is an extremely powerful supervised learning algorithm that is widely used in practice. GBM routinely features as a leading algorithm in machine learning competitions such as Kaggle and the KDDCup. In this work, we propose Accelerated Gradient Boosting Machine (AGBM) by incorporating Nesterov's acceleration techniques into the design of GBM. The difficulty in accelerating GBM lies in the fact that weak (inexact) learners are commonly used, and therefore the errors can accumulate in the momentum term. To overcome it, we design a "corrected pseudo residual" and fit best weak learner to this corrected pseudo residual, in order to perform the z-update. Thus, we are able to derive novel computational guarantees for AGBM. This is the first GBM type of algorithm with theoretically-justified accelerated convergence rate. Finally we demonstrate with a number of numerical experiments the effectiveness of AGBM over conventional GBM in obtaining a model with good training and/or testing data fidelity.

LGOct 24, 2018
Randomized Gradient Boosting Machine

Haihao Lu, Rahul Mazumder

Gradient Boosting Machine (GBM) introduced by Friedman is a powerful supervised learning algorithm that is very widely used in practice---it routinely features as a leading algorithm in machine learning competitions such as Kaggle and the KDDCup. In spite of the usefulness of GBM in practice, our current theoretical understanding of this method is rather limited. In this work, we propose Randomized Gradient Boosting Machine (RGBM) which leads to substantial computational gains compared to GBM, by using a randomization scheme to reduce search in the space of weak-learners. We derive novel computational guarantees for RGBM. We also provide a principled guideline towards better step-size selection in RGBM that does not require a line search. Our proposed framework is inspired by a special variant of coordinate descent that combines the benefits of randomized coordinate descent and greedy coordinate descent; and may be of independent interest as an optimization algorithm. As a special case, our results for RGBM lead to superior computational guarantees for GBM. Our computational guarantees depend upon a curious geometric quantity that we call Minimal Cosine Angle, which relates to the density of weak-learners in the prediction space. On a series of numerical experiments on real datasets, we demonstrate the effectiveness of RGBM over GBM in terms of obtaining a model with good training and/or testing data fidelity with a fraction of the computational cost.

LGOct 4, 2018
Approximate Leave-One-Out for High-Dimensional Non-Differentiable Learning Problems

Shuaiwen Wang, Wenda Zhou, Arian Maleki et al.

Consider the following class of learning schemes: \begin{equation} \label{eq:main-problem1} \hat{\boldsymbolβ} := \underset{\boldsymbolβ \in \mathcal{C}}{\arg\min} \;\sum_{j=1}^n \ell(\boldsymbol{x}_j^\top\boldsymbolβ; y_j) + λR(\boldsymbolβ), \qquad \qquad \qquad (1) \end{equation} where $\boldsymbol{x}_i \in \mathbb{R}^p$ and $y_i \in \mathbb{R}$ denote the $i^{\rm th}$ feature and response variable respectively. Let $\ell$ and $R$ be the convex loss function and regularizer, $\boldsymbolβ$ denote the unknown weights, and $λ$ be a regularization parameter. $\mathcal{C} \subset \mathbb{R}^{p}$ is a closed convex set. Finding the optimal choice of $λ$ is a challenging problem in high-dimensional regimes where both $n$ and $p$ are large. We propose three frameworks to obtain a computationally efficient approximation of the leave-one-out cross validation (LOOCV) risk for nonsmooth losses and regularizers. Our three frameworks are based on the primal, dual, and proximal formulations of (1). Each framework shows its strength in certain types of problems. We prove the equivalence of the three approaches under smoothness conditions. This equivalence enables us to justify the accuracy of the three methods under such conditions. We use our approaches to obtain a risk estimate for several standard problems, including generalized LASSO, nuclear norm regularization, and support vector machines. We empirically demonstrate the effectiveness of our results for non-differentiable cases.

MLJul 7, 2018
Approximate Leave-One-Out for Fast Parameter Tuning in High Dimensions

Shuaiwen Wang, Wenda Zhou, Haihao Lu et al.

Consider the following class of learning schemes: $$\hat{\boldsymbolβ} := \arg\min_{\boldsymbolβ}\;\sum_{j=1}^n \ell(\boldsymbol{x}_j^\top\boldsymbolβ; y_j) + λR(\boldsymbolβ),\qquad\qquad (1) $$ where $\boldsymbol{x}_i \in \mathbb{R}^p$ and $y_i \in \mathbb{R}$ denote the $i^{\text{th}}$ feature and response variable respectively. Let $\ell$ and $R$ be the loss function and regularizer, $\boldsymbolβ$ denote the unknown weights, and $λ$ be a regularization parameter. Finding the optimal choice of $λ$ is a challenging problem in high-dimensional regimes where both $n$ and $p$ are large. We propose two frameworks to obtain a computationally efficient approximation ALO of the leave-one-out cross validation (LOOCV) risk for nonsmooth losses and regularizers. Our two frameworks are based on the primal and dual formulations of (1). We prove the equivalence of the two approaches under smoothness conditions. This equivalence enables us to justify the accuracy of both methods under such conditions. We use our approaches to obtain a risk estimate for several standard problems, including generalized LASSO, nuclear norm regularization, and support vector machines. We empirically demonstrate the effectiveness of our results for non-differentiable cases.

LGFeb 27, 2017
Depth Creates No Bad Local Minima

Haihao Lu, Kenji Kawaguchi

In deep learning, \textit{depth}, as well as \textit{nonlinearity}, create non-convex loss surfaces. Then, does depth alone create bad local minima? In this paper, we prove that without nonlinearity, depth alone does not create bad local minima, although it induces non-convex loss surface. Using this insight, we greatly simplify a recently proposed proof to show that all of the local minima of feedforward deep linear neural networks are global minima. Our theoretical results generalize previous results with fewer assumptions, and this analysis provides a method to show similar results beyond square loss in deep linear models.