Alistair Stewart

DS
21papers
2,488citations
Novelty68%
AI Score32

21 Papers

DSJun 17, 2021
Statistical Query Lower Bounds for List-Decodable Linear Regression

Ilias Diakonikolas, Daniel M. Kane, Ankit Pensia et al.

We study the problem of list-decodable linear regression, where an adversary can corrupt a majority of the examples. Specifically, we are given a set $T$ of labeled examples $(x, y) \in \mathbb{R}^d \times \mathbb{R}$ and a parameter $0< α<1/2$ such that an $α$-fraction of the points in $T$ are i.i.d. samples from a linear regression model with Gaussian covariates, and the remaining $(1-α)$-fraction of the points are drawn from an arbitrary noise distribution. The goal is to output a small list of hypothesis vectors such that at least one of them is close to the target regression vector. Our main result is a Statistical Query (SQ) lower bound of $d^{\mathrm{poly}(1/α)}$ for this problem. Our SQ lower bound qualitatively matches the performance of previously developed algorithms, providing evidence that current upper bounds for this task are nearly best possible.

LGFeb 3, 2021
Outlier-Robust Learning of Ising Models Under Dobrushin's Condition

Ilias Diakonikolas, Daniel M. Kane, Alistair Stewart et al.

We study the problem of learning Ising models satisfying Dobrushin's condition in the outlier-robust setting where a constant fraction of the samples are adversarially corrupted. Our main result is to provide the first computationally efficient robust learning algorithm for this problem with near-optimal error guarantees. Our algorithm can be seen as a special case of an algorithm for robustly learning a distribution from a general exponential family. To prove its correctness for Ising models, we establish new anti-concentration results for degree-$2$ polynomials of Ising models that may be of independent interest.

CRMay 27, 2020
Overview of Polkadot and its Design Considerations

Jeff Burdges, Alfonso Cevallos, Peter Czaban et al.

In this paper we describe the design components of the heterogenous multi-chain protocol Polkadot and explain how these components help Polkadot address some of the existing shortcomings of blockchain technologies. At present, a vast number of blockchain projects have been introduced and employed with various features that are not necessarily designed to work with each other. This makes it difficult for users to utilise a large number of applications on different blockchain projects. Moreover, with the increase in number of projects the security that each one is providing individually becomes weaker. Polkadot aims to provide a scalable and interoperable framework for multiple chains with pooled security that is achieved by the collection of components described in this paper.

DSNov 19, 2019
Outlier-Robust High-Dimensional Sparse Estimation via Iterative Filtering

Ilias Diakonikolas, Sushrut Karmalkar, Daniel Kane et al.

We study high-dimensional sparse estimation tasks in a robust setting where a constant fraction of the dataset is adversarially corrupted. Specifically, we focus on the fundamental problems of robust sparse mean estimation and robust sparse PCA. We give the first practically viable robust estimators for these problems. In more detail, our algorithms are sample and computationally efficient and achieve near-optimal robustness guarantees. In contrast to prior provable algorithms which relied on the ellipsoid method, our algorithms use spectral techniques to iteratively remove outliers from the dataset. Our experimental evaluation on synthetic data shows that our algorithms are scalable and significantly outperform a range of previous approaches, nearly matching the best error rate without corruptions.

LGMay 31, 2018
Efficient Algorithms and Lower Bounds for Robust Linear Regression

Ilias Diakonikolas, Weihao Kong, Alistair Stewart

We study the problem of high-dimensional linear regression in a robust model where an $ε$-fraction of the samples can be adversarially corrupted. We focus on the fundamental setting where the covariates of the uncorrupted samples are drawn from a Gaussian distribution $\mathcal{N}(0, Σ)$ on $\mathbb{R}^d$. We give nearly tight upper bounds and computational lower bounds for this problem. Specifically, our main contributions are as follows: For the case that the covariance matrix is known to be the identity, we give a sample near-optimal and computationally efficient algorithm that outputs a candidate hypothesis vector $\widehatβ$ which approximates the unknown regression vector $β$ within $\ell_2$-norm $O(ε\log(1/ε) σ)$, where $σ$ is the standard deviation of the random observation noise. An error of $Ω(εσ)$ is information-theoretically necessary, even with infinite sample size. Prior work gave an algorithm for this problem with sample complexity $\tildeΩ(d^2/ε^2)$ whose error guarantee scales with the $\ell_2$-norm of $β$. For the case of unknown covariance, we show that we can efficiently achieve the same error guarantee as in the known covariance case using an additional $\tilde{O}(d^2/ε^2)$ unlabeled examples. On the other hand, an error of $O(εσ)$ can be information-theoretically attained with $O(d/ε^2)$ samples. We prove a Statistical Query (SQ) lower bound providing evidence that this quadratic tradeoff in the sample size is inherent. More specifically, we show that any polynomial time SQ learning algorithm for robust linear regression (in Huber's contamination model) with estimation complexity $O(d^{2-c})$, where $c>0$ is an arbitrarily small constant, must incur an error of $Ω(\sqrtε σ)$.

LGMar 7, 2018
Sever: A Robust Meta-Algorithm for Stochastic Optimization

Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane et al.

In high dimensions, most machine learning methods are brittle to even a small fraction of structured outliers. To address this, we introduce a new meta-algorithm that can take in a base learner such as least squares or stochastic gradient descent, and harden the learner to be resistant to outliers. Our method, Sever, possesses strong theoretical guarantees yet is also highly scalable -- beyond running the base learner itself, it only requires computing the top singular vector of a certain $n \times d$ matrix. We apply Sever on a drug design dataset and a spam classification dataset, and find that in both cases it has substantially greater robustness than several baselines. On the spam dataset, with $1\%$ corruptions, we achieved $7.4\%$ test error, compared to $13.4\%-20.5\%$ for the baselines, and $3\%$ error on the uncorrupted dataset. Similarly, on the drug design dataset, with $10\%$ corruptions, we achieved $1.42$ mean-squared error test error, compared to $1.51$-$2.33$ for the baselines, and $1.23$ error on the uncorrupted dataset.

STFeb 28, 2018
Near-Optimal Sample Complexity Bounds for Maximum Likelihood Estimation of Multivariate Log-concave Densities

Timothy Carpenter, Ilias Diakonikolas, Anastasios Sidiropoulos et al.

We study the problem of learning multivariate log-concave densities with respect to a global loss function. We obtain the first upper bound on the sample complexity of the maximum likelihood estimator (MLE) for a log-concave density on $\mathbb{R}^d$, for all $d \geq 4$. Prior to this work, no finite sample upper bound was known for this estimator in more than $3$ dimensions. In more detail, we prove that for any $d \geq 1$ and $ε>0$, given $\tilde{O}_d((1/ε)^{(d+3)/2})$ samples drawn from an unknown log-concave density $f_0$ on $\mathbb{R}^d$, the MLE outputs a hypothesis $h$ that with high probability is $ε$-close to $f_0$, in squared Hellinger loss. A sample complexity lower bound of $Ω_d((1/ε)^{(d+1)/2})$ was previously known for any learning algorithm that achieves this guarantee. We thus establish that the sample complexity of the log-concave MLE is near-optimal, up to an $\tilde{O}(1/ε)$ factor.

DSNov 20, 2017
List-Decodable Robust Mean Estimation and Learning Mixtures of Spherical Gaussians

Ilias Diakonikolas, Daniel M. Kane, Alistair Stewart

We study the problem of list-decodable Gaussian mean estimation and the related problem of learning mixtures of separated spherical Gaussians. We develop a set of techniques that yield new efficient algorithms with significantly improved guarantees for these problems. {\bf List-Decodable Mean Estimation.} Fix any $d \in \mathbb{Z}_+$ and $0< α<1/2$. We design an algorithm with runtime $O (\mathrm{poly}(n/α)^{d})$ that outputs a list of $O(1/α)$ many candidate vectors such that with high probability one of the candidates is within $\ell_2$-distance $O(α^{-1/(2d)})$ from the true mean. The only previous algorithm for this problem achieved error $\tilde O(α^{-1/2})$ under second moment conditions. For $d = O(1/ε)$, our algorithm runs in polynomial time and achieves error $O(α^ε)$. We also give a Statistical Query lower bound suggesting that the complexity of our algorithm is qualitatively close to best possible. {\bf Learning Mixtures of Spherical Gaussians.} We give a learning algorithm for mixtures of spherical Gaussians that succeeds under significantly weaker separation assumptions compared to prior work. For the prototypical case of a uniform mixture of $k$ identity covariance Gaussians we obtain: For any $ε>0$, if the pairwise separation between the means is at least $Ω(k^ε+\sqrt{\log(1/δ)})$, our algorithm learns the unknown parameters within accuracy $δ$ with sample complexity and running time $\mathrm{poly} (n, 1/δ, (k/ε)^{1/ε})$. The previously best known polynomial time algorithm required separation at least $k^{1/4} \mathrm{polylog}(k/δ)$. Our main technical contribution is a new technique, using degree-$d$ multivariate polynomials, to remove outliers from high-dimensional datasets where the majority of the points are corrupted.

DSSep 7, 2017
Sharp Bounds for Generalized Uniformity Testing

Ilias Diakonikolas, Daniel M. Kane, Alistair Stewart

We study the problem of generalized uniformity testing \cite{BC17} of a discrete probability distribution: Given samples from a probability distribution $p$ over an {\em unknown} discrete domain $\mathbfΩ$, we want to distinguish, with probability at least $2/3$, between the case that $p$ is uniform on some {\em subset} of $\mathbfΩ$ versus $ε$-far, in total variation distance, from any such uniform distribution. We establish tight bounds on the sample complexity of generalized uniformity testing. In more detail, we present a computationally efficient tester whose sample complexity is optimal, up to constant factors, and a matching information-theoretic lower bound. Specifically, we show that the sample complexity of generalized uniformity testing is $Θ\left(1/(ε^{4/3}\|p\|_3) + 1/(ε^{2} \|p\|_2) \right)$.

LGJul 5, 2017
Learning Geometric Concepts with Nasty Noise

Ilias Diakonikolas, Daniel M. Kane, Alistair Stewart

We study the efficient learnability of geometric concept classes - specifically, low-degree polynomial threshold functions (PTFs) and intersections of halfspaces - when a fraction of the data is adversarially corrupted. We give the first polynomial-time PAC learning algorithms for these concept classes with dimension-independent error guarantees in the presence of nasty noise under the Gaussian distribution. In the nasty noise model, an omniscient adversary can arbitrarily corrupt a small fraction of both the unlabeled data points and their labels. This model generalizes well-studied noise models, including the malicious noise model and the agnostic (adversarial label noise) model. Prior to our work, the only concept class for which efficient malicious learning algorithms were known was the class of origin-centered halfspaces. Specifically, our robust learning algorithm for low-degree PTFs succeeds under a number of tame distributions -- including the Gaussian distribution and, more generally, any log-concave distribution with (approximately) known low-degree moments. For LTFs under the Gaussian distribution, we give a polynomial-time algorithm that achieves error $O(ε)$, where $ε$ is the noise rate. At the core of our PAC learning results is an efficient algorithm to approximate the low-degree Chow-parameters of any bounded function in the presence of nasty noise. To achieve this, we employ an iterative spectral method for outlier detection and removal, inspired by recent work in robust unsupervised learning. Our aforementioned algorithm succeeds for a range of distributions satisfying mild concentration bounds and moment assumptions. The correctness of our robust learning algorithm for intersections of halfspaces makes essential use of a novel robust inverse independence lemma that may be of broader interest.

DSApr 12, 2017
Robustly Learning a Gaussian: Getting Optimal Error, Efficiently

Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane et al.

We study the fundamental problem of learning the parameters of a high-dimensional Gaussian in the presence of noise -- where an $\varepsilon$-fraction of our samples were chosen by an adversary. We give robust estimators that achieve estimation error $O(\varepsilon)$ in the total variation distance, which is optimal up to a universal constant that is independent of the dimension. In the case where just the mean is unknown, our robustness guarantee is optimal up to a factor of $\sqrt{2}$ and the running time is polynomial in $d$ and $1/ε$. When both the mean and covariance are unknown, the running time is polynomial in $d$ and quasipolynomial in $1/\varepsilon$. Moreover all of our algorithms require only a polynomial number of samples. Our work shows that the same sorts of error guarantees that were established over fifty years ago in the one-dimensional setting can also be achieved by efficient algorithms in high-dimensional settings.

LGMar 2, 2017
Being Robust (in High Dimensions) Can Be Practical

Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane et al.

Robust estimation is much more challenging in high dimensions than it is in one dimension: Most techniques either lead to intractable optimization problems or estimators that can tolerate only a tiny fraction of errors. Recent work in theoretical computer science has shown that, in appropriate distributional models, it is possible to robustly estimate the mean and covariance with polynomial time algorithms that can tolerate a constant fraction of corruptions, independent of the dimension. However, the sample and time complexity of these algorithms is prohibitively large for high-dimensional applications. In this work, we address both of these issues by establishing sample complexity bounds that are optimal, up to logarithmic factors, as well as giving various refinements that allow the algorithms to tolerate a much larger fraction of corruptions. Finally, we show on both synthetic and real data that our algorithms have state-of-the-art performance and suddenly make high-dimensional robust estimation a realistic possibility.

DSDec 9, 2016
Testing Bayesian Networks

Clement Canonne, Ilias Diakonikolas, Daniel Kane et al.

This work initiates a systematic investigation of testing high-dimensional structured distributions by focusing on testing Bayesian networks -- the prototypical family of directed graphical models. A Bayesian network is defined by a directed acyclic graph, where we associate a random variable with each node. The value at any particular node is conditionally independent of all the other non-descendant nodes once its parents are fixed. Specifically, we study the properties of identity testing and closeness testing of Bayesian networks. Our main contribution is the first non-trivial efficient testing algorithms for these problems and corresponding information-theoretic lower bounds. For a wide range of parameter settings, our testing algorithms have sample complexity sublinear in the dimension and are sample-optimal, up to constant factors.

LGNov 10, 2016
Statistical Query Lower Bounds for Robust Estimation of High-dimensional Gaussians and Gaussian Mixtures

Ilias Diakonikolas, Daniel M. Kane, Alistair Stewart

We describe a general technique that yields the first {\em Statistical Query lower bounds} for a range of fundamental high-dimensional learning problems involving Gaussian distributions. Our main results are for the problems of (1) learning Gaussian mixture models (GMMs), and (2) robust (agnostic) learning of a single unknown Gaussian distribution. For each of these problems, we show a {\em super-polynomial gap} between the (information-theoretic) sample complexity and the computational complexity of {\em any} Statistical Query algorithm for the problem. Our SQ lower bound for Problem (1) is qualitatively matched by known learning algorithms for GMMs. Our lower bound for Problem (2) implies that the accuracy of the robust learning algorithm in~\cite{DiakonikolasKKLMS16} is essentially best possible among all polynomial-time SQ algorithms. Our SQ lower bounds are attained via a unified moment-matching technique that is useful in other contexts and may be of broader interest. Our technique yields nearly-tight lower bounds for a number of related unsupervised estimation problems. Specifically, for the problems of (3) robust covariance estimation in spectral norm, and (4) robust sparse mean estimation, we establish a quadratic {\em statistical--computational tradeoff} for SQ algorithms, matching known upper bounds. Finally, our technique can be used to obtain tight sample complexity lower bounds for high-dimensional {\em testing} problems. Specifically, for the classical problem of robustly {\em testing} an unknown mean (known covariance) Gaussian, our technique implies an information-theoretic sample lower bound that scales {\em linearly} in the dimension. Our sample lower bound matches the sample complexity of the corresponding robust {\em learning} problem and separates the sample complexity of robust testing from standard (non-robust) testing.

DSJun 23, 2016
Robust Learning of Fixed-Structure Bayesian Networks

Yu Cheng, Ilias Diakonikolas, Daniel Kane et al.

We investigate the problem of learning Bayesian networks in a robust model where an $ε$-fraction of the samples are adversarially corrupted. In this work, we study the fully observable discrete case where the structure of the network is given. Even in this basic setting, previous learning algorithms either run in exponential time or lose dimension-dependent factors in their error guarantees. We provide the first computationally efficient robust learning algorithm for this problem with dimension-independent error guarantees. Our algorithm has near-optimal sample complexity, runs in polynomial time, and achieves error that scales nearly-linearly with the fraction of adversarially corrupted samples. Finally, we show on both synthetic and semi-synthetic data that our algorithm performs well in practice.

DSJun 9, 2016
Efficient Robust Proper Learning of Log-concave Distributions

Ilias Diakonikolas, Daniel M. Kane, Alistair Stewart

We study the {\em robust proper learning} of univariate log-concave distributions (over continuous and discrete domains). Given a set of samples drawn from an unknown target distribution, we want to compute a log-concave hypothesis distribution that is as close as possible to the target, in total variation distance. In this work, we give the first computationally efficient algorithm for this learning problem. Our algorithm achieves the information-theoretically optimal sample size (up to a constant factor), runs in polynomial time, and is robust to model misspecification with nearly-optimal error guarantees. Specifically, we give an algorithm that, on input $n=O(1/\eps^{5/2})$ samples from an unknown distribution $f$, runs in time $\widetilde{O}(n^{8/5})$, and outputs a log-concave hypothesis $h$ that (with high probability) satisfies $\dtv(h, f) = O(\opt)+\eps$, where $\opt$ is the minimum total variation distance between $f$ and the class of log-concave distributions. Our approach to the robust proper learning problem is quite flexible and may be applicable to many other univariate distribution families.

LGMay 26, 2016
Learning Multivariate Log-concave Distributions

Ilias Diakonikolas, Daniel M. Kane, Alistair Stewart

We study the problem of estimating multivariate log-concave probability density functions. We prove the first sample complexity upper bound for learning log-concave densities on $\mathbb{R}^d$, for all $d \geq 1$. Prior to our work, no upper bound on the sample complexity of this learning problem was known for the case of $d>3$. In more detail, we give an estimator that, for any $d \ge 1$ and $ε>0$, draws $\tilde{O}_d \left( (1/ε)^{(d+5)/2} \right)$ samples from an unknown target log-concave density on $\mathbb{R}^d$, and outputs a hypothesis that (with high probability) is $ε$-close to the target, in total variation distance. Our upper bound on the sample complexity comes close to the known lower bound of $Ω_d \left( (1/ε)^{(d+1)/2} \right)$ for this problem.

DSApr 21, 2016
Robust Estimators in High Dimensions without the Computational Intractability

Ilias Diakonikolas, Gautam Kamath, Daniel Kane et al.

We study high-dimensional distribution learning in an agnostic setting where an adversary is allowed to arbitrarily corrupt an $\varepsilon$-fraction of the samples. Such questions have a rich history spanning statistics, machine learning and theoretical computer science. Even in the most basic settings, the only known approaches are either computationally inefficient or lose dimension-dependent factors in their error guarantees. This raises the following question:Is high-dimensional agnostic distribution learning even possible, algorithmically? In this work, we obtain the first computationally efficient algorithms with dimension-independent error guarantees for agnostically learning several fundamental classes of high-dimensional distributions: (1) a single Gaussian, (2) a product distribution on the hypercube, (3) mixtures of two product distributions (under a natural balancedness condition), and (4) mixtures of spherical Gaussians. Our algorithms achieve error that is independent of the dimension, and in many cases scales nearly-linearly with the fraction of adversarially corrupted samples. Moreover, we develop a general recipe for detecting and correcting corruptions in high-dimensions, that may be applicable to many other problems.

DSNov 12, 2015
Properly Learning Poisson Binomial Distributions in Almost Polynomial Time

Ilias Diakonikolas, Daniel M. Kane, Alistair Stewart

We give an algorithm for properly learning Poisson binomial distributions. A Poisson binomial distribution (PBD) of order $n$ is the discrete probability distribution of the sum of $n$ mutually independent Bernoulli random variables. Given $\widetilde{O}(1/ε^2)$ samples from an unknown PBD $\mathbf{p}$, our algorithm runs in time $(1/ε)^{O(\log \log (1/ε))}$, and outputs a hypothesis PBD that is $ε$-close to $\mathbf{p}$ in total variation distance. The previously best known running time for properly learning PBDs was $(1/ε)^{O(\log(1/ε))}$. As one of our main contributions, we provide a novel structural characterization of PBDs. We prove that, for all $ε>0,$ there exists an explicit collection $\cal{M}$ of $(1/ε)^{O(\log \log (1/ε))}$ vectors of multiplicities, such that for any PBD $\mathbf{p}$ there exists a PBD $\mathbf{q}$ with $O(\log(1/ε))$ distinct parameters whose multiplicities are given by some element of ${\cal M}$, such that $\mathbf{q}$ is $ε$-close to $\mathbf{p}$. Our proof combines tools from Fourier analysis and algebraic geometry. Our approach to the proper learning problem is as follows: Starting with an accurate non-proper hypothesis, we fit a PBD to this hypothesis. More specifically, we essentially start with the hypothesis computed by the computationally efficient non-proper learning algorithm in our recent work~\cite{DKS15}. Our aforementioned structural characterization allows us to reduce the corresponding fitting problem to a collection of $(1/ε)^{O(\log \log(1/ε))}$ systems of low-degree polynomial inequalities. We show that each such system can be solved in time $(1/ε)^{O(\log \log(1/ε))}$, which yields the overall running time of our algorithm.

DSNov 11, 2015
The Fourier Transform of Poisson Multinomial Distributions and its Algorithmic Applications

Ilias Diakonikolas, Daniel M. Kane, Alistair Stewart

An $(n, k)$-Poisson Multinomial Distribution (PMD) is a random variable of the form $X = \sum_{i=1}^n X_i$, where the $X_i$'s are independent random vectors supported on the set of standard basis vectors in $\mathbb{R}^k.$ In this paper, we obtain a refined structural understanding of PMDs by analyzing their Fourier transform. As our core structural result, we prove that the Fourier transform of PMDs is {\em approximately sparse}, i.e., roughly speaking, its $L_1$-norm is small outside a small set. By building on this result, we obtain the following applications: {\bf Learning Theory.} We design the first computationally efficient learning algorithm for PMDs with respect to the total variation distance. Our algorithm learns an arbitrary $(n, k)$-PMD within variation distance $ε$ using a near-optimal sample size of $\widetilde{O}_k(1/ε^2),$ and runs in time $\widetilde{O}_k(1/ε^2) \cdot \log n.$ Previously, no algorithm with a $\mathrm{poly}(1/ε)$ runtime was known, even for $k=3.$ {\bf Game Theory.} We give the first efficient polynomial-time approximation scheme (EPTAS) for computing Nash equilibria in anonymous games. For normalized anonymous games with $n$ players and $k$ strategies, our algorithm computes a well-supported $ε$-Nash equilibrium in time $n^{O(k^3)} \cdot (k/ε)^{O(k^3\log(k/ε)/\log\log(k/ε))^{k-1}}.$ The best previous algorithm for this problem had running time $n^{(f(k)/ε)^k},$ where $f(k) = Ω(k^{k^2})$, for any $k>2.$ {\bf Statistics.} We prove a multivariate central limit theorem (CLT) that relates an arbitrary PMD to a discretized multivariate Gaussian with the same mean and covariance, in total variation distance. Our new CLT strengthens the CLT of Valiant and Valiant by completely removing the dependence on $n$ in the error bound.

DSMay 4, 2015
Optimal Learning via the Fourier Transform for Sums of Independent Integer Random Variables

Ilias Diakonikolas, Daniel M. Kane, Alistair Stewart

We study the structure and learnability of sums of independent integer random variables (SIIRVs). For $k \in \mathbb{Z}_{+}$, a $k$-SIIRV of order $n \in \mathbb{Z}_{+}$ is the probability distribution of the sum of $n$ independent random variables each supported on $\{0, 1, \dots, k-1\}$. We denote by ${\cal S}_{n,k}$ the set of all $k$-SIIRVs of order $n$. In this paper, we tightly characterize the sample and computational complexity of learning $k$-SIIRVs. More precisely, we design a computationally efficient algorithm that uses $\widetilde{O}(k/ε^2)$ samples, and learns an arbitrary $k$-SIIRV within error $ε,$ in total variation distance. Moreover, we show that the {\em optimal} sample complexity of this learning problem is $Θ((k/ε^2)\sqrt{\log(1/ε)}).$ Our algorithm proceeds by learning the Fourier transform of the target $k$-SIIRV in its effective support. Its correctness relies on the {\em approximate sparsity} of the Fourier transform of $k$-SIIRVs -- a structural property that we establish, roughly stating that the Fourier transform of $k$-SIIRVs has small magnitude outside a small set. Along the way we prove several new structural results about $k$-SIIRVs. As one of our main structural contributions, we give an efficient algorithm to construct a sparse {\em proper} $ε$-cover for ${\cal S}_{n,k},$ in total variation distance. We also obtain a novel geometric characterization of the space of $k$-SIIRVs. Our characterization allows us to prove a tight lower bound on the size of $ε$-covers for ${\cal S}_{n,k}$, and is the key ingredient in our tight sample complexity lower bound. Our approach of exploiting the sparsity of the Fourier transform in distribution learning is general, and has recently found additional applications.