h-index17
19papers
488citations
Novelty55%
AI Score47

19 Papers

OCFeb 28, 2023
High Probability Convergence of Stochastic Gradient Methods

Zijian Liu, Ta Duy Nguyen, Thien Hang Nguyen et al.

In this work, we describe a generic approach to show convergence with high probability for both stochastic convex and non-convex optimization with sub-Gaussian noise. In previous works for convex optimization, either the convergence is only in expectation or the bound depends on the diameter of the domain. Instead, we show high probability convergence with bounds depending on the initial distance to the optimal solution. The algorithms use step sizes analogous to the standard settings and are universal to Lipschitz functions, smooth functions, and their linear combinations. This method can be applied to the non-convex case. We demonstrate an $O((1+σ^{2}\log(1/δ))/T+σ/\sqrt{T})$ convergence rate when the number of iterations $T$ is known and an $O((1+σ^{2}\log(T/δ))/\sqrt{T})$ convergence rate when $T$ is unknown for SGD, where $1-δ$ is the desired success probability. These bounds improve over existing bounds in the literature. Additionally, we demonstrate that our techniques can be used to obtain high probability bound for AdaGrad-Norm (Ward et al., 2019) that removes the bounded gradients assumption from previous works. Furthermore, our technique for AdaGrad-Norm extends to the standard per-coordinate AdaGrad algorithm (Duchi et al., 2011), providing the first noise-adapted high probability convergence for AdaGrad.

LGSep 29, 2022
On the Convergence of AdaGrad(Norm) on $\R^{d}$: Beyond Convexity, Non-Asymptotic Rate and Acceleration

Zijian Liu, Ta Duy Nguyen, Alina Ene et al.

Existing analysis of AdaGrad and other adaptive methods for smooth convex optimization is typically for functions with bounded domain diameter. In unconstrained problems, previous works guarantee an asymptotic convergence rate without an explicit constant factor that holds true for the entire function class. Furthermore, in the stochastic setting, only a modified version of AdaGrad, different from the one commonly used in practice, in which the latest gradient is not used to update the stepsize, has been analyzed. Our paper aims at bridging these gaps and developing a deeper understanding of AdaGrad and its variants in the standard setting of smooth convex functions as well as the more general setting of quasar convex functions. First, we demonstrate new techniques to explicitly bound the convergence rate of the vanilla AdaGrad for unconstrained problems in both deterministic and stochastic settings. Second, we propose a variant of AdaGrad for which we can show the convergence of the last iterate, instead of the average iterate. Finally, we give new accelerated adaptive algorithms and their convergence guarantee in the deterministic setting with explicit dependency on the problem parameters, improving upon the asymptotic rate shown in previous works.

LGSep 29, 2022
META-STORM: Generalized Fully-Adaptive Variance Reduced SGD for Unbounded Functions

Zijian Liu, Ta Duy Nguyen, Thien Hang Nguyen et al.

We study the application of variance reduction (VR) techniques to general non-convex stochastic optimization problems. In this setting, the recent work STORM [Cutkosky-Orabona '19] overcomes the drawback of having to compute gradients of "mega-batches" that earlier VR methods rely on. There, STORM utilizes recursive momentum to achieve the VR effect and is then later made fully adaptive in STORM+ [Levy et al., '21], where full-adaptivity removes the requirement for obtaining certain problem-specific parameters such as the smoothness of the objective and bounds on the variance and norm of the stochastic gradients in order to set the step size. However, STORM+ crucially relies on the assumption that the function values are bounded, excluding a large class of useful functions. In this work, we propose META-STORM, a generalized framework of STORM+ that removes this bounded function values assumption while still attaining the optimal convergence rate for non-convex optimization. META-STORM not only maintains full-adaptivity, removing the need to obtain problem specific parameters, but also improves the convergence rate's dependency on the problem parameters. Furthermore, META-STORM can utilize a large range of parameter settings that subsumes previous methods allowing for more flexibility in a wider range of settings. Finally, we demonstrate the effectiveness of META-STORM through experiments across common deep learning tasks. Our algorithm improves upon the previous work STORM+ and is competitive with widely used algorithms after the addition of per-coordinate update and exponential moving average heuristics.

OCOct 3, 2022
High Probability Convergence for Accelerated Stochastic Mirror Descent

Alina Ene, Huy L. Nguyen

In this work, we describe a generic approach to show convergence with high probability for stochastic convex optimization. In previous works, either the convergence is only in expectation or the bound depends on the diameter of the domain. Instead, we show high probability convergence with bounds depending on the initial distance to the optimal solution as opposed to the domain diameter. The algorithms use step sizes analogous to the standard settings and are universal to Lipschitz functions, smooth functions, and their linear combinations.

DSFeb 12
Adaptive Power Iteration Method for Differentially Private PCA

Ta Duy Nguyen, Alina Ene, Huy Le Nguyen

We study $(ε,δ)$-differentially private algorithms for the problem of approximately computing the top singular vector of a matrix $A\in\mathbb{R}^{n\times d}$ where each row of $A$ is a datapoint in $\mathbb{R}^{d}$. In our privacy model, neighboring inputs differ by one single row/datapoint. We study the private variant of the power iteration method, which is widely adopted in practice. Our algorithm is based on a filtering technique which adapts to the coherence parameter of the input matrix. This technique provides a utility that goes beyond the worst-case guarantees for matrices with low coherence parameter. Our work departs from and complements the work by Hardt-Roth (STOC 2013) which designed a private power iteration method for the privacy model where neighboring inputs differ in one single entry by at most 1.

LGFeb 3, 2023
Online Ad Allocation with Predictions

Fabian Spaeh, Alina Ene

Display Ads and the generalized assignment problem are two well-studied online packing problems with important applications in ad allocation and other areas. In both problems, ad impressions arrive online and have to be allocated immediately to budget-constrained advertisers. Worst-case algorithms that achieve the ideal competitive ratio are known, but might act overly conservative given the predictable and usually tame nature of real-world input. Given this discrepancy, we develop an algorithm for both problems that incorporate machine-learned predictions and can thus improve the performance beyond the worst-case. Our algorithm is based on the work of Feldman et al. (2009) and similar in nature to Mahdian et al. (2007) who were the first to develop a learning-augmented algorithm for the related, but more structured Ad Words problem. We use a novel analysis to show that our algorithm is able to capitalize on a good prediction, while being robust against poor predictions. We experimentally evaluate our algorithm on synthetic and real-world data on a wide range of predictions. Our algorithm is consistently outperforming the worst-case algorithm without predictions.

63.1DSApr 29
Solving Positive Linear Programs with Differential Privacy

Alina Ene, Huy Le Nguyen, Ta Duy Nguyen et al.

We study differentially private approximation algorithms for positive linear programs (LPs with nonnegative coefficients and variables), focusing on the fundamental families of packing, covering, and mixed packing-covering formulations. We focus on the high-sensitivity, constraint-private regime of Hsu-Roth-Roughgarden-Ullman (ICALP 2014), where neighboring instances may differ by an arbitrary single constraint, so one cannot hope to approximately satisfy every constraint under privacy. We give private solvers that return approximate solutions while violating only a controlled number of constraints. Our algorithms improve the prior instance-dependent guarantees, and also yield new data-independent bounds that depend only on the dimension. Our techniques involve a dense multiplicative weights update method developed from a regularized dual viewpoint, which we analyze in a way that exploits structure specific to positive LPs.

DSMay 25, 2023
Online and Streaming Algorithms for Constrained $k$-Submodular Maximization

Fabian Spaeh, Alina Ene, Huy L. Nguyen

Constrained $k$-submodular maximization is a general framework that captures many discrete optimization problems such as ad allocation, influence maximization, personalized recommendation, and many others. In many of these applications, datasets are large or decisions need to be made in an online manner, which motivates the development of efficient streaming and online algorithms. In this work, we develop single-pass streaming and online algorithms for constrained $k$-submodular maximization with both monotone and general (possibly non-monotone) objectives subject to cardinality and knapsack constraints. Our algorithms achieve provable constant-factor approximation guarantees which improve upon the state of the art in almost all settings. Moreover, they are combinatorial and very efficient, and have optimal space and running time. We experimentally evaluate our algorithms on instances for ad allocation and other applications, where we observe that our algorithms are efficient and scalable, and construct solutions that are comparable in value to offline greedy algorithms.

OCJan 28, 2022
Adaptive Accelerated (Extra-)Gradient Methods with Variance Reduction

Zijian Liu, Ta Duy Nguyen, Alina Ene et al.

In this paper, we study the finite-sum convex optimization problem focusing on the general convex case. Recently, the study of variance reduced (VR) methods and their accelerated variants has made exciting progress. However, the step size used in the existing VR algorithms typically depends on the smoothness parameter, which is often unknown and requires tuning in practice. To address this problem, we propose two novel adaptive VR algorithms: Adaptive Variance Reduced Accelerated Extra-Gradient (AdaVRAE) and Adaptive Variance Reduced Accelerated Gradient (AdaVRAG). Our algorithms do not require knowledge of the smoothness parameter. AdaVRAE uses $\mathcal{O}\left(n\log\log n+\sqrt{\frac{nβ}ε}\right)$ gradient evaluations and AdaVRAG uses $\mathcal{O}\left(n\log\log n+\sqrt{\frac{nβ\logβ}ε}\right)$ gradient evaluations to attain an $\mathcal{O}(ε)$-suboptimal solution, where $n$ is the number of functions in the finite sum and $β$ is the smoothness parameter. This result matches the best-known convergence rate of non-adaptive VR methods and it improves upon the convergence of the state of the art adaptive VR method, AdaSVRG. We demonstrate the superior performance of our algorithms compared with previous methods in experiments on real-world datasets.

LGDec 22, 2020
Projection-Free Bandit Optimization with Privacy Guarantees

Alina Ene, Huy L. Nguyen, Adrian Vladu

We design differentially private algorithms for the bandit convex optimization problem in the projection-free setting. This setting is important whenever the decision set has a complex geometry, and access to it is done efficiently only through a linear optimization oracle, hence Euclidean projections are unavailable (e.g. matroid polytope, submodular base polytope). This is the first differentially-private algorithm for projection-free bandit optimization, and in fact our bound of $\widetilde{O}(T^{3/4})$ matches the best known non-private projection-free algorithm (Garber-Kretzu, AISTATS `20) and the best known private algorithm, even for the weaker setting when projections are available (Smith-Thakurta, NeurIPS `13).

LGOct 15, 2020
Adaptive and Universal Algorithms for Variational Inequalities with Optimal Convergence

Alina Ene, Huy L. Nguyen

We develop new adaptive algorithms for variational inequalities with monotone operators, which capture many problems of interest, notably convex optimization and convex-concave saddle point problems. Our algorithms automatically adapt to unknown problem parameters such as the smoothness and the norm of the operator, and the variance of the stochastic evaluation oracle. We show that our algorithms are universal and simultaneously achieve the optimal convergence rates in the non-smooth, smooth, and stochastic settings. The convergence guarantees of our algorithms improve over existing adaptive methods by a $Ω(\sqrt{\ln T})$ factor, matching the optimal non-adaptive algorithms. Additionally, prior works require that the optimization domain is bounded. In this work, we remove this restriction and give algorithms for unbounded domains that are adaptive and universal. Our general proof techniques can be used for many variants of the algorithm using one or two operator evaluations per iteration. The classical methods based on the ExtraGradient/MirrorProx algorithm require two operator evaluations per iteration, which is the dominant factor in the running time in many settings.

LGJul 17, 2020
Adaptive Gradient Methods for Constrained Convex Optimization and Variational Inequalities

Alina Ene, Huy L. Nguyen, Adrian Vladu

We provide new adaptive first-order methods for constrained convex optimization. Our main algorithms AdaACSA and AdaAGD+ are accelerated methods, which are universal in the sense that they achieve nearly-optimal convergence rates for both smooth and non-smooth functions, even when they only have access to stochastic gradients. In addition, they do not require any prior knowledge on how the objective function is parametrized, since they automatically adjust their per-coordinate learning rate. These can be seen as truly accelerated Adagrad methods for constrained optimization. We complement them with a simpler algorithm AdaGrad+ which enjoys the same features, and achieves the standard non-accelerated convergence rate. We also present a set of new results involving adaptive methods for unconstrained optimization and monotone operators.

DSMay 30, 2019
Parallel Algorithm for Non-Monotone DR-Submodular Maximization

Alina Ene, Huy L. Nguyen

In this work, we give a new parallel algorithm for the problem of maximizing a non-monotone diminishing returns submodular function subject to a cardinality constraint. For any desired accuracy $ε$, our algorithm achieves a $1/e - ε$ approximation using $O(\log{n} \log(1/ε) / ε^3)$ parallel rounds of function evaluations. The approximation guarantee nearly matches the best approximation guarantee known for the problem in the sequential setting and the number of parallel rounds is nearly-optimal for any constant $ε$. Previous algorithms achieve worse approximation guarantees using $Ω(\log^2{n})$ parallel rounds. Our experimental evaluation suggests that our algorithm obtains solutions whose objective value nearly matches the value obtained by the state of the art sequential algorithms, and it outperforms previous parallel algorithms in number of parallel rounds, iterations, and solution quality.

DSDec 4, 2018
A Parallel Double Greedy Algorithm for Submodular Maximization

Alina Ene, Huy L. Nguyen, Adrian Vladu

We study parallel algorithms for the problem of maximizing a non-negative submodular function. Our main result is an algorithm that achieves a nearly-optimal $1/2 -ε$ approximation using $O(\log(1/ε) / ε)$ parallel rounds of function evaluations. Our algorithm is based on a continuous variant of the double greedy algorithm of Buchbinder et al. that achieves the optimal $1/2$ approximation in the sequential setting. Our algorithm applies more generally to the problem of maximizing a continuous diminishing-returns (DR) function.

LGMar 6, 2017
Decomposable Submodular Function Minimization: Discrete and Continuous

Alina Ene, Huy L. Nguyen, László A. Végh

This paper investigates connections between discrete and continuous approaches for decomposable submodular function minimization. We provide improved running time estimates for the state-of-the-art continuous algorithms for the problem using combinatorial arguments. We also provide a systematic experimental comparison of the two types of methods, based on a clear distinction between level-0 and level-1 algorithms.

DSJun 27, 2016
A Reduction for Optimizing Lattice Submodular Functions with Diminishing Returns

Alina Ene, Huy L. Nguyen

A function $f: \mathbb{Z}_+^E \rightarrow \mathbb{R}_+$ is DR-submodular if it satisfies $f({\bf x} + χ_i) -f ({\bf x}) \ge f({\bf y} + χ_i) - f({\bf y})$ for all ${\bf x}\le {\bf y}, i\in E$. Recently, the problem of maximizing a DR-submodular function $f: \mathbb{Z}_+^E \rightarrow \mathbb{R}_+$ subject to a budget constraint $\|{\bf x}\|_1 \leq B$ as well as additional constraints has received significant attention \cite{SKIK14,SY15,MYK15,SY16}. In this note, we give a generic reduction from the DR-submodular setting to the submodular setting. The running time of the reduction and the size of the resulting submodular instance depends only \emph{logarithmically} on $B$. Using this reduction, one can translate the results for unconstrained and constrained submodular maximization to the DR-submodular setting for many types of constraints in a unified manner.

DSJul 14, 2015
A New Framework for Distributed Submodular Maximization

Rafael da Ponte Barbosa, Alina Ene, Huy L. Nguyen et al.

A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. A lot of recent effort has been devoted to developing distributed algorithms for these problems. However, these results suffer from high number of rounds, suboptimal approximation ratios, or both. We develop a framework for bringing existing algorithms in the sequential setting to the distributed setting, achieving near optimal approximation ratios for many settings in only a constant number of MapReduce rounds. Our techniques also give a fast sequential algorithm for non-monotone maximization subject to a matroid constraint.

LGFeb 9, 2015
Random Coordinate Descent Methods for Minimizing Decomposable Submodular Functions

Alina Ene, Huy L. Nguyen

Submodular function minimization is a fundamental optimization problem that arises in several applications in machine learning and computer vision. The problem is known to be solvable in polynomial time, but general purpose algorithms have high running times and are unsuitable for large-scale problems. Recent work have used convex optimization techniques to obtain very practical algorithms for minimizing functions that are sums of ``simple" functions. In this paper, we use random coordinate descent methods to obtain algorithms with faster linear convergence rates and cheaper iteration costs. Compared to alternating projection methods, our algorithms do not rely on full-dimensional vector operations and they converge in significantly fewer iterations.

LGFeb 9, 2015
The Power of Randomization: Distributed Submodular Maximization on Massive Datasets

Rafael da Ponte Barbosa, Alina Ene, Huy L. Nguyen et al.

A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. Unfortunately, the resulting submodular optimization problems are often too large to be solved on a single machine. We develop a simple distributed algorithm that is embarrassingly parallel and it achieves provable, constant factor, worst-case approximation guarantees. In our experiments, we demonstrate its efficiency in large problems with different kinds of constraints with objective values always close to what is achievable in the centralized setting.