MLJun 14, 2022Code
Benefits of Additive Noise in Composing Classes with Bounded CapacityAlireza Fathollah Pour, Hassan Ashtiani
We observe that given two (compatible) classes of functions $\mathcal{F}$ and $\mathcal{H}$ with small capacity as measured by their uniform covering numbers, the capacity of the composition class $\mathcal{H} \circ \mathcal{F}$ can become prohibitively large or even unbounded. We then show that adding a small amount of Gaussian noise to the output of $\mathcal{F}$ before composing it with $\mathcal{H}$ can effectively control the capacity of $\mathcal{H} \circ \mathcal{F}$, offering a general recipe for modular design. To prove our results, we define new notions of uniform covering number of random functions with respect to the total variation and Wasserstein distances. We instantiate our results for the case of multi-layer sigmoid neural networks. Preliminary empirical results on MNIST dataset indicate that the amount of noise required to improve over existing uniform bounds can be numerically negligible (i.e., element-wise i.i.d. Gaussian noise with standard deviation $10^{-240}$). The source codes are available at https://github.com/fathollahpour/composition_noise.
MLMar 7, 2023
Polynomial Time and Private Learning of Unbounded Gaussian Mixture ModelsJamil Arbas, Hassan Ashtiani, Christopher Liaw
We study the problem of privately estimating the parameters of $d$-dimensional Gaussian Mixture Models (GMMs) with $k$ components. For this, we develop a technique to reduce the problem to its non-private counterpart. This allows us to privatize existing non-private algorithms in a blackbox manner, while incurring only a small overhead in the sample complexity and running time. As the main application of our framework, we develop an $(\varepsilon, δ)$-differentially private algorithm to learn GMMs using the non-private algorithm of Moitra and Valiant [MV10] as a blackbox. Consequently, this gives the first sample complexity upper bound and first polynomial time algorithm for privately learning GMMs without any boundedness assumptions on the parameters. As part of our analysis, we prove a tight (up to a constant factor) lower bound on the total variation distance of high-dimensional Gaussians which can be of independent interest.
MLMar 2, 2022
Adversarially Robust Learning with ToleranceHassan Ashtiani, Vinayak Pathak, Ruth Urner
We initiate the study of tolerant adversarial PAC-learning with respect to metric perturbation sets. In adversarial PAC-learning, an adversary is allowed to replace a test point $x$ with an arbitrary point in a closed ball of radius $r$ centered at $x$. In the tolerant version, the error of the learner is compared with the best achievable error with respect to a slightly larger perturbation radius $(1+γ)r$. This simple tweak helps us bridge the gap between theory and practice and obtain the first PAC-type guarantees for algorithmic techniques that are popular in practice. Our first result concerns the widely-used ``perturb-and-smooth'' approach for adversarial learning. For perturbation sets with doubling dimension $d$, we show that a variant of these approaches PAC-learns any hypothesis class $\mathcal{H}$ with VC-dimension $v$ in the $γ$-tolerant adversarial setting with $O\left(\frac{v(1+1/γ)^{O(d)}}{\varepsilon}\right)$ samples. This is in contrast to the traditional (non-tolerant) setting in which, as we show, the perturb-and-smooth approach can provably fail. Our second result shows that one can PAC-learn the same class using $\widetilde{O}\left(\frac{d.v\log(1+1/γ)}{\varepsilon^2}\right)$ samples even in the agnostic setting. This result is based on a novel compression-based algorithm, and achieves a linear dependence on the doubling dimension as well as the VC-dimension. This is in contrast to the non-tolerant setting where there is no known sample complexity upper bound that depend polynomially on the VC-dimension.
MLSep 7, 2023
Mixtures of Gaussians are Privately Learnable with a Polynomial Number of SamplesMohammad Afzali, Hassan Ashtiani, Christopher Liaw
We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP). Our main result is that $\text{poly}(k,d,1/α,1/\varepsilon,\log(1/δ))$ samples are sufficient to estimate a mixture of $k$ Gaussians in $\mathbb{R}^d$ up to total variation distance $α$ while satisfying $(\varepsilon, δ)$-DP. This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs. To solve the problem, we devise a new framework which may be useful for other tasks. On a high level, we show that if a class of distributions (such as Gaussians) is (1) list decodable and (2) admits a "locally small'' cover (Bun et al., 2021) with respect to total variation distance, then the class of its mixtures is privately learnable. The proof circumvents a known barrier indicating that, unlike Gaussians, GMMs do not admit a locally small cover (Aden-Ali et al., 2021b).
MLJul 5, 2024
Agnostic Private Density Estimation for GMMs via List Global StabilityMohammad Afzali, Hassan Ashtiani, Christopher Liaw
We consider the problem of private density estimation for mixtures of unrestricted high dimensional Gaussians in the agnostic setting. We prove the first upper bound on the sample complexity of this problem. Previously, private learnability of high dimensional GMMs was only known in the realizable setting [Afzali et al., 2024]. To prove our result, we exploit the notion of $\textit{list global stability}$ [Ghazi et al., 2021b,a] that was originally introduced in the context of private supervised learning. We define an agnostic variant of this definition, showing that its existence is sufficient for agnostic private density estimation. We then construct an agnostic list globally stable learner for GMMs.
MLDec 9, 2023
Sample-Optimal Locally Private Hypothesis Selection and the Provable Benefits of InteractivityAlireza F. Pour, Hassan Ashtiani, Shahab Asoodeh
We study the problem of hypothesis selection under the constraint of local differential privacy. Given a class $\mathcal{F}$ of $k$ distributions and a set of i.i.d. samples from an unknown distribution $h$, the goal of hypothesis selection is to pick a distribution $\hat{f}$ whose total variation distance to $h$ is comparable with the best distribution in $\mathcal{F}$ (with high probability). We devise an $\varepsilon$-locally-differentially-private ($\varepsilon$-LDP) algorithm that uses $Θ\left(\frac{k}{α^2\min \{\varepsilon^2,1\}}\right)$ samples to guarantee that $d_{TV}(h,\hat{f})\leq α+ 9 \min_{f\in \mathcal{F}}d_{TV}(h,f)$ with high probability. This sample complexity is optimal for $\varepsilon<1$, matching the lower bound of Gopi et al. (2020). All previously known algorithms for this problem required $Ω\left(\frac{k\log k}{α^2\min \{ \varepsilon^2 ,1\}} \right)$ samples to work. Moreover, our result demonstrates the power of interaction for $\varepsilon$-LDP hypothesis selection. Namely, it breaks the known lower bound of $Ω\left(\frac{k\log k}{α^2\min \{ \varepsilon^2 ,1\}} \right)$ for the sample complexity of non-interactive hypothesis selection. Our algorithm breaks this barrier using only $Θ(\log \log k)$ rounds of interaction. To prove our results, we define the notion of \emph{critical queries} for a Statistical Query Algorithm (SQA) which may be of independent interest. Informally, an SQA is said to use a small number of critical queries if its success relies on the accuracy of only a small number of queries it asks. We then design an LDP algorithm that uses a smaller number of critical queries.
LGFeb 11, 2025
Simplifying Adversarially Robust PAC Learning with ToleranceHassan Ashtiani, Vinayak Pathak, Ruth Urner
Adversarially robust PAC learning has proved to be challenging, with the currently best known learners [Montasser et al., 2021a] relying on improper methods based on intricate compression schemes, resulting in sample complexity exponential in the VC-dimension. A series of follow up work considered a slightly relaxed version of the problem called adversarially robust learning with tolerance [Ashtiani et al., 2023, Bhattacharjee et al., 2023, Raman et al., 2024] and achieved better sample complexity in terms of the VC-dimension. However, those algorithms were either improper and complex, or required additional assumptions on the hypothesis class H. We prove, for the first time, the existence of a simpler learner that achieves a sample complexity linear in the VC-dimension without requiring additional assumptions on H. Even though our learner is improper, it is "almost proper" in the sense that it outputs a hypothesis that is "similar" to a hypothesis in H. We also use the ideas from our algorithm to construct a semi-supervised learner in the tolerant setting. This simple algorithm achieves comparable bounds to the previous (non-tolerant) semi-supervised algorithm of Attias et al. [2022a], but avoids the use of intricate subroutines from previous works, and is "almost proper."
LGNov 4, 2024
Sample-Efficient Private Learning of Mixtures of GaussiansHassan Ashtiani, Mahbod Majid, Shyam Narayanan
We study the problem of learning mixtures of Gaussians with approximate differential privacy. We prove that roughly $kd^2 + k^{1.5} d^{1.75} + k^2 d$ samples suffice to learn a mixture of $k$ arbitrary $d$-dimensional Gaussians up to low total variation distance, with differential privacy. Our work improves over the previous best result [AAL24b] (which required roughly $k^2 d^4$ samples) and is provably optimal when $d$ is much larger than $k^2$. Moreover, we give the first optimal bound for privately learning mixtures of $k$ univariate (i.e., $1$-dimensional) Gaussians. Importantly, we show that the sample complexity for privately learning mixtures of univariate Gaussians is linear in the number of components $k$, whereas the previous best sample complexity [AAL21] was quadratic in $k$. Our algorithms utilize various techniques, including the inverse sensitivity mechanism [AD20b, AD20a, HKMN23], sample compression for distributions [ABDH+20], and methods for bounding volumes of sumsets.
MLMay 28, 2023
On the Role of Noise in the Sample Complexity of Learning Recurrent Neural Networks: Exponential Gaps for Long SequencesAlireza Fathollah Pour, Hassan Ashtiani
We consider the class of noisy multi-layered sigmoid recurrent neural networks with $w$ (unbounded) weights for classification of sequences of length $T$, where independent noise distributed according to $\mathcal{N}(0,σ^2)$ is added to the output of each neuron in the network. Our main result shows that the sample complexity of PAC learning this class can be bounded by $O (w\log(T/σ))$. For the non-noisy version of the same class (i.e., $σ=0$), we prove a lower bound of $Ω(wT)$ for the sample complexity. Our results indicate an exponential gap in the dependence of sample complexity on $T$ for noisy versus non-noisy networks. Moreover, given the mild logarithmic dependence of the upper bound on $1/σ$, this gap still holds even for numerically negligible values of $σ$.
MLNov 22, 2021
Private and polynomial time algorithms for learning Gaussians and beyondHassan Ashtiani, Christopher Liaw
We present a fairly general framework for reducing $(\varepsilon, δ)$ differentially private (DP) statistical estimation to its non-private counterpart. As the main application of this framework, we give a polynomial time and $(\varepsilon,δ)$-DP algorithm for learning (unrestricted) Gaussian distributions in $\mathbb{R}^d$. The sample complexity of our approach for learning the Gaussian up to total variation distance $α$ is $\widetilde{O}(d^2/α^2 + d^2\sqrt{\ln(1/δ)}/α\varepsilon + d\ln(1/δ) / α\varepsilon)$ matching (up to logarithmic factors) the best known information-theoretic (non-efficient) sample complexity upper bound due to Aden-Ali, Ashtiani, and Kamath (ALT'21). In an independent work, Kamath, Mouzakis, Singhal, Steinke, and Ullman (arXiv:2111.04609) proved a similar result using a different approach and with $O(d^{5/2})$ sample complexity dependence on $d$. As another application of our framework, we provide the first polynomial time $(\varepsilon, δ)$-DP algorithm for robust learning of (unrestricted) Gaussians with sample complexity $\widetilde{O}(d^{3.5})$. In another independent work, Kothari, Manurangsi, and Velingker (arXiv:2112.03548) also provided a polynomial time $(\varepsilon, δ)$-DP algorithm for robust learning of Gaussians with sample complexity $\widetilde{O}(d^8)$.
LGJun 3, 2021
Privately Learning Mixtures of Axis-Aligned GaussiansIshaq Aden-Ali, Hassan Ashtiani, Christopher Liaw
We consider the problem of learning mixtures of Gaussians under the constraint of approximate differential privacy. We prove that $\widetilde{O}(k^2 d \log^{3/2}(1/δ) / α^2 \varepsilon)$ samples are sufficient to learn a mixture of $k$ axis-aligned Gaussians in $\mathbb{R}^d$ to within total variation distance $α$ while satisfying $(\varepsilon, δ)$-differential privacy. This is the first result for privately learning mixtures of unbounded axis-aligned (or even unbounded univariate) Gaussians. If the covariance matrices of each of the Gaussians is the identity matrix, we show that $\widetilde{O}(kd/α^2 + kd \log(1/δ) / α\varepsilon)$ samples are sufficient. Recently, the "local covering" technique of Bun, Kamath, Steinke, and Wu has been successfully used for privately learning high-dimensional Gaussians with a known covariance matrix and extended to privately learning general high-dimensional Gaussians by Aden-Ali, Ashtiani, and Kamath. Given these positive results, this approach has been proposed as a promising direction for privately learning mixtures of Gaussians. Unfortunately, we show that this is not possible. We design a new technique for privately learning mixture distributions. A class of distributions $\mathcal{F}$ is said to be list-decodable if there is an algorithm that, given "heavily corrupted" samples from $f\in \mathcal{F}$, outputs a list of distributions, $\widehat{\mathcal{F}}$, such that one of the distributions in $\widehat{\mathcal{F}}$ approximates $f$. We show that if $\mathcal{F}$ is privately list-decodable, then we can privately learn mixtures of distributions in $\mathcal{F}$. Finally, we show axis-aligned Gaussian distributions are privately list-decodable, thereby proving mixtures of such distributions are privately learnable.
MLOct 19, 2020
On the Sample Complexity of Privately Learning Unbounded High-Dimensional GaussiansIshaq Aden-Ali, Hassan Ashtiani, Gautam Kamath
We provide sample complexity upper bounds for agnostically learning multivariate Gaussians under the constraint of approximate differential privacy. These are the first finite sample upper bounds for general Gaussians which do not impose restrictions on the parameters of the distribution. Our bounds are near-optimal in the case when the covariance is known to be the identity, and conjectured to be near-optimal in the general case. From a technical standpoint, we provide analytic tools for arguing the existence of global "locally small" covers from local covers of the space. These are exploited using modifications of recent techniques for differentially private hypothesis selection. Our techniques may prove useful for privately learning other distribution classes which do not possess a finite cover.
MLJun 30, 2020
Black-box Certification and Learning under Adversarial PerturbationsHassan Ashtiani, Vinayak Pathak, Ruth Urner
We formally study the problem of classification under adversarial perturbations from a learner's perspective as well as a third-party who aims at certifying the robustness of a given black-box classifier. We analyze a PAC-type framework of semi-supervised learning and identify possibility and impossibility results for proper learning of VC-classes in this setting. We further introduce a new setting of black-box certification under limited query budget, and analyze this for various classes of predictors and perturbation. We also consider the viewpoint of a black-box adversary that aims at finding adversarial examples, showing that the existence of an adversary with polynomial query complexity can imply the existence of a sample efficient robust learner.
LGDec 5, 2019
On the Sample Complexity of Learning Sum-Product NetworksIshaq Aden-Ali, Hassan Ashtiani
Sum-Product Networks (SPNs) can be regarded as a form of deep graphical models that compactly represent deeply factored and mixed distributions. An SPN is a rooted directed acyclic graph (DAG) consisting of a set of leaves (corresponding to base distributions), a set of sum nodes (which represent mixtures of their children distributions) and a set of product nodes (representing the products of its children distributions). In this work, we initiate the study of the sample complexity of PAC-learning the set of distributions that correspond to SPNs. We show that the sample complexity of learning tree structured SPNs with the usual type of leaves (i.e., Gaussian or discrete) grows at most linearly (up to logarithmic factors) with the number of parameters of the SPN. More specifically, we show that the class of distributions that corresponds to tree structured Gaussian SPNs with $k$ mixing weights and $e$ ($d$-dimensional Gaussian) leaves can be learned within Total Variation error $ε$ using at most $\widetilde{O}(\frac{ed^2+k}{ε^2})$ samples. A similar result holds for tree structured SPNs with discrete leaves. We obtain the upper bounds based on the recently proposed notion of distribution compression schemes. More specifically, we show that if a (base) class of distributions $\mathcal{F}$ admits an "efficient" compression, then the class of tree structured SPNs with leaves from $\mathcal{F}$ also admits an efficient compression.
STJan 11, 2018
Some techniques in density estimationHassan Ashtiani, Abbas Mehrabian
Density estimation is an interdisciplinary topic at the intersection of statistics, theoretical computer science and machine learning. We review some old and new techniques for bounding the sample complexity of estimating densities of continuous distributions, focusing on the class of mixtures of Gaussians and its subclasses. In particular, we review the main techniques used to prove the new sample complexity bounds for mixtures of Gaussians by Ashtiani, Ben-David, Harvey, Liaw, Mehrabian, and Plan arXiv:1710.05209.
LGOct 14, 2017
Near-optimal Sample Complexity Bounds for Robust Learning of Gaussians Mixtures via Compression SchemesHassan Ashtiani, Shai Ben-David, Nick Harvey et al.
We prove that $\tildeΘ(k d^2 / \varepsilon^2)$ samples are necessary and sufficient for learning a mixture of $k$ Gaussians in $\mathbb{R}^d$, up to error $\varepsilon$ in total variation distance. This improves both the known upper bounds and lower bounds for this problem. For mixtures of axis-aligned Gaussians, we show that $\tilde{O}(k d / \varepsilon^2)$ samples suffice, matching a known lower bound. Moreover, these results hold in the agnostic-learning/robust-estimation setting as well, where the target distribution is only approximately a mixture of Gaussians. The upper bound is shown using a novel technique for distribution learning based on a notion of `compression.' Any class of distributions that allows such a compression scheme can also be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of products and mixtures of those distributions. The core of our main result is showing that the class of Gaussians in $\mathbb{R}^d$ admits a small-sized compression scheme.
LGJun 6, 2017
Sample-Efficient Learning of MixturesHassan Ashtiani, Shai Ben-David, Abbas Mehrabian
We consider PAC learning of probability distributions (a.k.a. density estimation), where we are given an i.i.d. sample generated from an unknown target distribution, and want to output a distribution that is close to the target in total variation distance. Let $\mathcal F$ be an arbitrary class of probability distributions, and let $\mathcal{F}^k$ denote the class of $k$-mixtures of elements of $\mathcal F$. Assuming the existence of a method for learning $\mathcal F$ with sample complexity $m_{\mathcal{F}}(ε)$, we provide a method for learning $\mathcal F^k$ with sample complexity $O({k\log k \cdot m_{\mathcal F}(ε) }/{ε^{2}})$. Our mixture learning algorithm has the property that, if the $\mathcal F$-learner is proper/agnostic, then the $\mathcal F^k$-learner would be proper/agnostic as well. This general result enables us to improve the best known sample complexity upper bounds for a variety of important mixture classes. First, we show that the class of mixtures of $k$ axis-aligned Gaussians in $\mathbb{R}^d$ is PAC-learnable in the agnostic setting with $\widetilde{O}({kd}/{ε^ 4})$ samples, which is tight in $k$ and $d$ up to logarithmic factors. Second, we show that the class of mixtures of $k$ Gaussians in $\mathbb{R}^d$ is PAC-learnable in the agnostic setting with sample complexity $\widetilde{O}({kd^2}/{ε^ 4})$, which improves the previous known bounds of $\widetilde{O}({k^3d^2}/{ε^ 4})$ and $\widetilde{O}(k^4d^4/ε^ 2)$ in its dependence on $k$ and $d$. Finally, we show that the class of mixtures of $k$ log-concave distributions over $\mathbb{R}^d$ is PAC-learnable using $\widetilde{O}(d^{(d+5)/2}ε^{-(d+9)/2}k)$ samples.
LGJun 8, 2016
Clustering with Same-Cluster QueriesHassan Ashtiani, Shrinu Kushagra, Shai Ben-David
We propose a framework for Semi-Supervised Active Clustering framework (SSAC), where the learner is allowed to interact with a domain expert, asking whether two given instances belong to the same cluster or not. We study the query and computational complexity of clustering in this framework. We consider a setting where the expert conforms to a center-based clustering with a notion of margin. We show that there is a trade off between computational complexity and query complexity; We prove that for the case of $k$-means clustering (i.e., when the expert conforms to a solution of $k$-means), having access to relatively few such queries allows efficient solutions to otherwise NP hard problems. In particular, we provide a probabilistic polynomial-time (BPP) algorithm for clustering in this setting that asks $O\big(k^2\log k + k\log n)$ same-cluster queries and runs with time complexity $O\big(kn\log n)$ (where $k$ is the number of clusters and $n$ is the number of instances). The algorithm succeeds with high probability for data satisfying margin conditions under which, without queries, we show that the problem is NP hard. We also prove a lower bound on the number of queries needed to have a computationally efficient clustering algorithm in this setting.
MLJun 19, 2015
Representation Learning for Clustering: A Statistical FrameworkHassan Ashtiani, Shai Ben-David
We address the problem of communicating domain knowledge from a user to the designer of a clustering algorithm. We propose a protocol in which the user provides a clustering of a relatively small random sample of a data set. The algorithm designer then uses that sample to come up with a data representation under which $k$-means clustering results in a clustering (of the full data set) that is aligned with the user's clustering. We provide a formal statistical model for analyzing the sample complexity of learning a clustering representation with this paradigm. We then introduce a notion of capacity of a class of possible representations, in the spirit of the VC-dimension, showing that classes of representations that have finite such dimension can be successfully learned with sample size error bounds, and end our discussion with an analysis of that dimension for classes of representations induced by linear embeddings.