MLMar 7, 2023
Polynomial Time and Private Learning of Unbounded Gaussian Mixture ModelsJamil Arbas, Hassan Ashtiani, Christopher Liaw
We study the problem of privately estimating the parameters of $d$-dimensional Gaussian Mixture Models (GMMs) with $k$ components. For this, we develop a technique to reduce the problem to its non-private counterpart. This allows us to privatize existing non-private algorithms in a blackbox manner, while incurring only a small overhead in the sample complexity and running time. As the main application of our framework, we develop an $(\varepsilon, δ)$-differentially private algorithm to learn GMMs using the non-private algorithm of Moitra and Valiant [MV10] as a blackbox. Consequently, this gives the first sample complexity upper bound and first polynomial time algorithm for privately learning GMMs without any boundedness assumptions on the parameters. As part of our analysis, we prove a tight (up to a constant factor) lower bound on the total variation distance of high-dimensional Gaussians which can be of independent interest.
GTFeb 16, 2023
User Response in Ad Auctions: An MDP Formulation of Long-Term Revenue OptimizationYang Cai, Zhe Feng, Christopher Liaw et al.
We propose a new Markov Decision Process (MDP) model for ad auctions to capture the user response to the quality of ads, with the objective of maximizing the long-term discounted revenue. By incorporating user response, our model takes into consideration all three parties involved in the auction (advertiser, auctioneer, and user). The state of the user is modeled as a user-specific click-through rate (CTR) with the CTR changing in the next round according to the set of ads shown to the user in the current round. We characterize the optimal mechanism for this MDP as a Myerson's auction with a notion of modified virtual value, which relies on the value distribution of the advertiser, the current user state, and the future impact of showing the ad to the user. Leveraging this characterization, we design a sample-efficient and computationally-efficient algorithm which outputs an approximately optimal policy that requires only sample access to the true MDP and the value distributions of the bidders. Finally, we propose a simple mechanism built upon second price auctions with personalized reserve prices and show it can achieve a constant-factor approximation to the optimal long term discounted revenue.
LGJun 1, 2022
Continuous Prediction with Experts' AdviceVictor Sanches Portella, Christopher Liaw, Nicholas J. A. Harvey
Prediction with experts' advice is one of the most fundamental problems in online learning and captures many of its technical challenges. A recent line of work has looked at online learning through the lens of differential equations and continuous-time analysis. This viewpoint has yielded optimal results for several problems in online learning. In this paper, we employ continuous-time stochastic calculus in order to study the discrete-time experts' problem. We use these tools to design a continuous-time, parameter-free algorithm with improved guarantees for the quantile regret. We then develop an analogous discrete-time algorithm with a very similar analysis and identical quantile regret bounds. Finally, we design an anytime continuous-time algorithm with regret matching the optimal fixed-time rate when the gains are independent Brownian Motions; in many settings, this is the most difficult case. This gives some evidence that, even with adversarial gains, the optimal anytime and fixed-time regrets may coincide.
MLSep 7, 2023
Mixtures of Gaussians are Privately Learnable with a Polynomial Number of SamplesMohammad Afzali, Hassan Ashtiani, Christopher Liaw
We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP). Our main result is that $\text{poly}(k,d,1/α,1/\varepsilon,\log(1/δ))$ samples are sufficient to estimate a mixture of $k$ Gaussians in $\mathbb{R}^d$ up to total variation distance $α$ while satisfying $(\varepsilon, δ)$-DP. This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs. To solve the problem, we devise a new framework which may be useful for other tasks. On a high level, we show that if a class of distributions (such as Gaussians) is (1) list decodable and (2) admits a "locally small'' cover (Bun et al., 2021) with respect to total variation distance, then the class of its mixtures is privately learnable. The proof circumvents a known barrier indicating that, unlike Gaussians, GMMs do not admit a locally small cover (Aden-Ali et al., 2021b).
MLJul 5, 2024
Agnostic Private Density Estimation for GMMs via List Global StabilityMohammad Afzali, Hassan Ashtiani, Christopher Liaw
We consider the problem of private density estimation for mixtures of unrestricted high dimensional Gaussians in the agnostic setting. We prove the first upper bound on the sample complexity of this problem. Previously, private learnability of high dimensional GMMs was only known in the realizable setting [Afzali et al., 2024]. To prove our result, we exploit the notion of $\textit{list global stability}$ [Ghazi et al., 2021b,a] that was originally introduced in the context of private supervised learning. We define an agnostic variant of this definition, showing that its existence is sufficient for agnostic private density estimation. We then construct an agnostic list globally stable learner for GMMs.
IRFeb 29, 2024
Improved Online Learning Algorithms for CTR Prediction in Ad AuctionsZhe Feng, Christopher Liaw, Zixin Zhou
In this work, we investigate the online learning problem of revenue maximization in ad auctions, where the seller needs to learn the click-through rates (CTRs) of each ad candidate and charge the price of the winner through a pay-per-click manner. We focus on two models of the advertisers' strategic behaviors. First, we assume that the advertiser is completely myopic; i.e.~in each round, they aim to maximize their utility only for the current round. In this setting, we develop an online mechanism based on upper-confidence bounds that achieves a tight $O(\sqrt{T})$ regret in the worst-case and negative regret when the values are static across all the auctions and there is a gap between the highest expected value (i.e.~value multiplied by their CTR) and second highest expected value ad. Next, we assume that the advertiser is non-myopic and cares about their long term utility. This setting is much more complex since an advertiser is incentivized to influence the mechanism by bidding strategically in earlier rounds. In this setting, we provide an algorithm to achieve negative regret for the static valuation setting (with a positive gap), which is in sharp contrast with the prior work that shows $O(T^{2/3})$ regret when the valuation is generated by adversary.
DSOct 24, 2025
A Unified Approach to Submodular Maximization Under NoiseKshipra Bhawalkar, Yang Cai, Zhe Feng et al.
We consider the problem of maximizing a submodular function with access to a noisy value oracle for the function instead of an exact value oracle. Similar to prior work, we assume that the noisy oracle is persistent in that multiple calls to the oracle for a specific set always return the same value. In this model, Hassidim and Singer (2017) design a $(1-1/e)$-approximation algorithm for monotone submodular maximization subject to a cardinality constraint, and Huang et al (2022) design a $(1-1/e)/2$-approximation algorithm for monotone submodular maximization subject to any arbitrary matroid constraint. In this paper, we design a meta-algorithm that allows us to take any "robust" algorithm for exact submodular maximization as a black box and transform it into an algorithm for the noisy setting while retaining the approximation guarantee. By using the meta-algorithm with the measured continuous greedy algorithm, we obtain a $(1-1/e)$-approximation (resp. $1/e$-approximation) for monotone (resp. non-monotone) submodular maximization subject to a matroid constraint under noise. Furthermore, by using the meta-algorithm with the double greedy algorithm, we obtain a $1/2$-approximation for unconstrained (non-monotone) submodular maximization under noise.
MLNov 22, 2021
Private and polynomial time algorithms for learning Gaussians and beyondHassan Ashtiani, Christopher Liaw
We present a fairly general framework for reducing $(\varepsilon, δ)$ differentially private (DP) statistical estimation to its non-private counterpart. As the main application of this framework, we give a polynomial time and $(\varepsilon,δ)$-DP algorithm for learning (unrestricted) Gaussian distributions in $\mathbb{R}^d$. The sample complexity of our approach for learning the Gaussian up to total variation distance $α$ is $\widetilde{O}(d^2/α^2 + d^2\sqrt{\ln(1/δ)}/α\varepsilon + d\ln(1/δ) / α\varepsilon)$ matching (up to logarithmic factors) the best known information-theoretic (non-efficient) sample complexity upper bound due to Aden-Ali, Ashtiani, and Kamath (ALT'21). In an independent work, Kamath, Mouzakis, Singhal, Steinke, and Ullman (arXiv:2111.04609) proved a similar result using a different approach and with $O(d^{5/2})$ sample complexity dependence on $d$. As another application of our framework, we provide the first polynomial time $(\varepsilon, δ)$-DP algorithm for robust learning of (unrestricted) Gaussians with sample complexity $\widetilde{O}(d^{3.5})$. In another independent work, Kothari, Manurangsi, and Velingker (arXiv:2112.03548) also provided a polynomial time $(\varepsilon, δ)$-DP algorithm for robust learning of Gaussians with sample complexity $\widetilde{O}(d^8)$.
LGJun 3, 2021
Privately Learning Mixtures of Axis-Aligned GaussiansIshaq Aden-Ali, Hassan Ashtiani, Christopher Liaw
We consider the problem of learning mixtures of Gaussians under the constraint of approximate differential privacy. We prove that $\widetilde{O}(k^2 d \log^{3/2}(1/δ) / α^2 \varepsilon)$ samples are sufficient to learn a mixture of $k$ axis-aligned Gaussians in $\mathbb{R}^d$ to within total variation distance $α$ while satisfying $(\varepsilon, δ)$-differential privacy. This is the first result for privately learning mixtures of unbounded axis-aligned (or even unbounded univariate) Gaussians. If the covariance matrices of each of the Gaussians is the identity matrix, we show that $\widetilde{O}(kd/α^2 + kd \log(1/δ) / α\varepsilon)$ samples are sufficient. Recently, the "local covering" technique of Bun, Kamath, Steinke, and Wu has been successfully used for privately learning high-dimensional Gaussians with a known covariance matrix and extended to privately learning general high-dimensional Gaussians by Aden-Ali, Ashtiani, and Kamath. Given these positive results, this approach has been proposed as a promising direction for privately learning mixtures of Gaussians. Unfortunately, we show that this is not possible. We design a new technique for privately learning mixture distributions. A class of distributions $\mathcal{F}$ is said to be list-decodable if there is an algorithm that, given "heavily corrupted" samples from $f\in \mathcal{F}$, outputs a list of distributions, $\widehat{\mathcal{F}}$, such that one of the distributions in $\widehat{\mathcal{F}}$ approximates $f$. We show that if $\mathcal{F}$ is privately list-decodable, then we can privately learn mixtures of distributions in $\mathcal{F}$. Finally, we show axis-aligned Gaussian distributions are privately list-decodable, thereby proving mixtures of such distributions are privately learnable.
LGFeb 20, 2020
Optimal anytime regret with two expertsNicholas J. A. Harvey, Christopher Liaw, Edwin Perkins et al.
We consider the classical problem of prediction with expert advice. In the fixed-time setting, where the time horizon is known in advance, algorithms that achieve the optimal regret are known when there are two, three, or four experts or when the number of experts is large. Much less is known about the problem in the anytime setting, where the time horizon is not known in advance. No minimax optimal algorithm was previously known in the anytime setting, regardless of the number of experts. Even for the case of two experts, Luo and Schapire have left open the problem of determining the optimal algorithm. We design the first minimax optimal algorithm for minimizing regret in the anytime setting. We consider the case of two experts, and prove that the optimal regret is $γ\sqrt{t} / 2$ at all time steps $t$, where $γ$ is a natural constant that arose 35 years ago in studying fundamental properties of Brownian motion. The algorithm is designed by considering a continuous analogue of the regret problem, which is solved using ideas from stochastic calculus.
LGSep 2, 2019
Simple and optimal high-probability bounds for strongly-convex stochastic gradient descentNicholas J. A. Harvey, Christopher Liaw, Sikander Randhawa
We consider stochastic gradient descent algorithms for minimizing a non-smooth, strongly-convex function. Several forms of this algorithm, including suffix averaging, are known to achieve the optimal $O(1/T)$ convergence rate in expectation. We consider a simple, non-uniform averaging strategy of Lacoste-Julien et al. (2011) and prove that it achieves the optimal $O(1/T)$ convergence rate with high probability. Our proof uses a recently developed generalization of Freedman's inequality. Finally, we compare several of these algorithms experimentally and show that this non-uniform averaging strategy outperforms many standard techniques, and with smaller variance.
LGDec 13, 2018
Tight Analyses for Non-Smooth Stochastic Gradient DescentNicholas J. A. Harvey, Christopher Liaw, Yaniv Plan et al.
Consider the problem of minimizing functions that are Lipschitz and strongly convex, but not necessarily differentiable. We prove that after $T$ steps of stochastic gradient descent, the error of the final iterate is $O(\log(T)/T)$ with high probability. We also construct a function from this class for which the error of the final iterate of deterministic gradient descent is $Ω(\log(T)/T)$. This shows that the upper bound is tight and that, in this setting, the last iterate of stochastic gradient descent has the same general error rate (with high probability) as deterministic gradient descent. This resolves both open questions posed by Shamir (2012). An intermediate step of our analysis proves that the suffix averaging method achieves error $O(1/T)$ with high probability, which is optimal (for any first-order optimization method). This improves results of Rakhlin (2012) and Hazan and Kale (2014), both of which achieved error $O(1/T)$, but only in expectation, and achieved a high probability error bound of $O(\log \log(T)/T)$, which is suboptimal. We prove analogous results for functions that are Lipschitz and convex, but not necessarily strongly convex or differentiable. After $T$ steps of stochastic gradient descent, the error of the final iterate is $O(\log(T)/\sqrt{T})$ with high probability, and there exists a function for which the error of the final iterate of deterministic gradient descent is $Ω(\log(T)/\sqrt{T})$.
LGOct 14, 2017
Near-optimal Sample Complexity Bounds for Robust Learning of Gaussians Mixtures via Compression SchemesHassan Ashtiani, Shai Ben-David, Nick Harvey et al.
We prove that $\tildeΘ(k d^2 / \varepsilon^2)$ samples are necessary and sufficient for learning a mixture of $k$ Gaussians in $\mathbb{R}^d$, up to error $\varepsilon$ in total variation distance. This improves both the known upper bounds and lower bounds for this problem. For mixtures of axis-aligned Gaussians, we show that $\tilde{O}(k d / \varepsilon^2)$ samples suffice, matching a known lower bound. Moreover, these results hold in the agnostic-learning/robust-estimation setting as well, where the target distribution is only approximately a mixture of Gaussians. The upper bound is shown using a novel technique for distribution learning based on a notion of `compression.' Any class of distributions that allows such a compression scheme can also be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of products and mixtures of those distributions. The core of our main result is showing that the class of Gaussians in $\mathbb{R}^d$ admits a small-sized compression scheme.