Santosh Vempala

18papers

1,023citations

Novelty71%

AI Score48

Ranked #50,267 of 205,806 authors (top 24%)#11,363 in LG (top 27%)

18 Papers

84.7DSApr 16

Tight Bounds for Learning Polyhedra with a Margin

Shyamal Patel, Santosh Vempala

We give an algorithm for PAC learning intersections of $k$ halfspaces with a $ρ$ margin to within error $\varepsilon$ that runs in time $\textsf{poly}(k, \varepsilon^{-1}, ρ^{-1}) \cdot \exp \left(O(\sqrt{n \log(1/ρ) \log k})\right)$. Notably, this improves on prior work which had an exponential dependence on either $k$ or $ρ^{-1}$ and matches known cryptographic and Statistical Query lower bounds up to the logarithmic factors in $k$ and $ρ$ in the exponent. Our learning algorithm extends to the more general setting when we are only promised that most points have distance at least $ρ$ from the boundary of the polyhedron, making it applicable to continuous distributions as well.

LGJun 17, 2020

Socially Fair k-Means Clustering

Mehrdad Ghadiri, Samira Samadi, Santosh Vempala

We show that the popular k-means clustering algorithm (Lloyd's heuristic), used for a variety of scientific data, can result in outcomes that are unfavorable to subgroups of data (e.g., demographic groups). Such biased clusterings can have deleterious implications for human-centric applications such as resource allocation. We present a fair k-means objective and algorithm to choose cluster centers that provide equitable costs for different groups. The algorithm, Fair-Lloyd, is a modification of Lloyd's heuristic for k-means, inheriting its simplicity, efficiency, and stability. In comparison with standard Lloyd's, we find that on benchmark datasets, Fair-Lloyd exhibits unbiased performance by ensuring that all groups have equal costs in the output k-clustering, while incurring a negligible increase in running time, thus making it a viable fair option wherever k-means is currently used.

DSNov 26, 2019

Robustly Clustering a Mixture of Gaussians

He Jia, Santosh Vempala

We give an efficient algorithm for robustly clustering of a mixture of two arbitrary Gaussians, a central open problem in the theory of computationally efficient robust estimation, assuming only that the the means of the component Gaussians are well-separated or their covariances are well-separated. Our algorithm and analysis extend naturally to robustly clustering mixtures of well-separated strongly logconcave distributions. The mean separation required is close to the smallest possible to guarantee that most of the measure of each component can be separated by some hyperplane (for covariances, it is the same condition in the second degree polynomial kernel). We also show that for Gaussian mixtures, separation in total variation distance suffices to achieve robust clustering. Our main tools are a new identifiability criterion based on isotropic position and the Fisher discriminant, and a corresponding Sum-of-Squares convex programming relaxation, of fixed degree.

CRMay 31, 2019

Human-Usable Password Schemas: Beyond Information-Theoretic Security

Elan Rosenfeld, Santosh Vempala, Manuel Blum

Password users frequently employ passwords that are too simple, or they just reuse passwords for multiple websites. A common complaint is that utilizing secure passwords is too difficult. One possible solution to this problem is to use a password schema. Password schemas are deterministic functions which map challenges (typically the website name) to responses (passwords). Previous work has been done on developing and analyzing publishable schemas, but these analyses have been information-theoretic, not complexity-theoretic; they consider an adversary with infinite computing power. We perform an analysis with respect to adversaries having currently achievable computing capabilities, assessing the realistic practical security of such schemas. We prove for several specific schemas that a computer is no worse off than an infinite adversary and that it can successfully extract all information from leaked challenges and their respective responses, known as challenge-response pairs. We also show that any schema that hopes to be secure against adversaries with bounded computation should obscure information in a very specific way, by introducing many possible constraints with each challenge-response pair. These surprising results put the analyses of password schemas on a more solid and practical footing.

DMFeb 28, 2019

Multi-Criteria Dimensionality Reduction with Applications to Fairness

Uthaipon Tantipongpipat, Samira Samadi, Mohit Singh et al.

Dimensionality reduction is a classical technique widely used for data analysis. One foundational instantiation is Principal Component Analysis (PCA), which minimizes the average reconstruction error. In this paper, we introduce the "multi-criteria dimensionality reduction" problem where we are given multiple objectives that need to be optimized simultaneously. As an application, our model captures several fairness criteria for dimensionality reduction such as our novel Fair-PCA problem and the Nash Social Welfare (NSW) problem. In Fair-PCA, the input data is divided into $k$ groups, and the goal is to find a single $d$-dimensional representation for all groups for which the minimum variance of any one group is maximized. In NSW, the goal is to maximize the product of the individual variances of the groups achieved by the common low-dimensional space. Our main result is an exact polynomial-time algorithm for the two-criterion dimensionality reduction problem when the two criteria are increasing concave functions. As an application of this result, we obtain a polynomial time algorithm for Fair-PCA for $k=2$ groups and a polynomial time algorithm for NSW objective for $k=2$ groups. We also give approximation algorithms for $k>2$. Our technical contribution in the above results is to prove new low-rank properties of extreme point solutions to semi-definite programs. We conclude with experiments indicating the effectiveness of algorithms based on extreme point solutions of semi-definite programs on several real-world data sets.

LGOct 31, 2018

The Price of Fair PCA: One Extra Dimension

Samira Samadi, Uthaipon Tantipongpipat, Jamie Morgenstern et al.

We investigate whether the standard dimensionality reduction technique of PCA inadvertently produces data representations with different fidelity for two different populations. We show on several real-world data sets, PCA has higher reconstruction error on population A than on B (for example, women versus men or lower- versus higher-educated individuals). This can happen even when the data set has a similar number of samples from A and B. This motivates our study of dimensionality reduction techniques which maintain similar fidelity for A and B. We define the notion of Fair PCA and give a polynomial-time algorithm for finding a low dimensional representation of the data which is nearly-optimal with respect to this measure. Finally, we show on real-world data sets that our algorithm can be used to efficiently generate a fair low dimensional representation of the data.

DSOct 28, 2018

Smoothed Analysis of Discrete Tensor Decomposition and Assemblies of Neurons

Nima Anari, Constantinos Daskalakis, Wolfgang Maass et al.

We analyze linear independence of rank one tensors produced by tensor powers of randomly perturbed vectors. This enables efficient decomposition of sums of high-order tensors. Our analysis builds upon [BCMV14] but allows for a wider range of perturbation models, including discrete ones. We give an application to recovering assemblies of neurons. Assemblies are large sets of neurons representing specific memories or concepts. The size of the intersection of two assemblies has been shown in experiments to represent the extent to which these memories co-occur or these concepts are related; the phenomenon is called association of assemblies. This suggests that an animal's memory is a complex web of associations, and poses the problem of recovering this representation from cognitive data. Motivated by this problem, we study the following more general question: Can we reconstruct the Venn diagram of a family of sets, given the sizes of their $\ell$-wise intersections? We show that as long as the family of sets is randomly perturbed, it is enough for the number of measurements to be polynomially larger than the number of nonempty regions of the Venn diagram to fully reconstruct the diagram.

LGMay 7, 2018

Gradient Descent for One-Hidden-Layer Neural Networks: Polynomial Convergence and SQ Lower Bounds

Santosh Vempala, John Wilmes

We study the complexity of training neural network models with one hidden nonlinear activation layer and an output weighted sum layer. We analyze Gradient Descent applied to learning a bounded target function on $n$ real-valued inputs. We give an agnostic learning guarantee for GD: starting from a randomly initialized network, it converges in mean squared loss to the minimum error (in $2$-norm) of the best approximation of the target function using a polynomial of degree at most $k$. Moreover, for any $k$, the size of the network and number of iterations needed are both bounded by $n^{O(k)}\log(1/ε)$. In particular, this applies to training networks of unbiased sigmoids and ReLUs. We also rigorously explain the empirical finding that gradient descent discovers lower frequency Fourier components before higher frequency components. We complement this result with nearly matching lower bounds in the Statistical Query model. GD fits well in the SQ framework since each training step is determined by an expectation over the input distribution. We show that any SQ algorithm that achieves significant improvement over a constant function with queries of tolerance some inverse polynomial in the input dimensionality $n$ must use $n^{Ω(k)}$ queries even when the target functions are restricted to a set of $n^{O(k)}$ degree-$k$ polynomials, and the input distribution is uniform over the unit sphere; for this class the information-theoretic lower bound is only $Θ(k \log n)$. Our approach for both parts is based on spherical harmonics. We view gradient descent as an operator on the space of functions, and study its dynamics. An essential tool is the Funk-Hecke theorem, which explains the eigenfunctions of this operator in the case of the mean squared loss.

HCDec 11, 2017

Usability of Humanly Computable Passwords

Samira Samadi, Santosh Vempala, Adam Tauman Kalai

Reusing passwords across multiple websites is a common practice that compromises security. Recently, Blum and Vempala have proposed password strategies to help people calculate, in their heads, passwords for different sites without dependence on third-party tools or external devices. Thus far, the security and efficiency of these "mental algorithms" has been analyzed only theoretically. But are such methods usable? We present the first usability study of humanly computable password strategies, involving a learning phase (to learn a password strategy), then a rehearsal phase (to login to a few websites), and multiple follow-up tests. In our user study, with training, participants were able to calculate a deterministic eight-character password for an arbitrary new website in under 20 seconds.

LGJul 14, 2017

On the Complexity of Learning Neural Networks

Le Song, Santosh Vempala, John Wilmes et al.

The stunning empirical successes of neural networks currently lack rigorous theoretical explanation. What form would such an explanation take, in the face of existing complexity-theoretic lower bounds? A first step might be to show that data generated by neural networks with a single hidden layer, smooth activation functions and benign input distributions can be learned efficiently. We demonstrate here a comprehensive lower bound ruling out this possibility: for a wide class of activation functions (including all currently used), and inputs drawn from any logconcave distribution, there is a family of one-hidden-layer functions whose output is a sum gate, that are hard to learn in a precise sense: any statistical query algorithm (which includes all known variants of stochastic gradient descent with any loss function) needs an exponential number of queries even using tolerance inversely proportional to the input dimensionality. Moreover, this hard family of functions is realizable with a small (sublinear in dimension) number of activation units in the single hidden layer. The lower bound is also robust to small perturbations of the true weights. Systematic experiments illustrate a phase transition in the training error as predicted by the analysis.

HCJul 5, 2017

The Complexity of Human Computation: A Concrete Model with an Application to Passwords

Manuel Blum, Santosh Vempala

What can humans compute in their heads? We are thinking of a variety of Crypto Protocols, games like Sudoku, Crossword Puzzles, Speed Chess, and so on. The intent of this paper is to apply the ideas and methods of theoretical computer science to better understand what humans can compute in their heads. For example, can a person compute a function in their head so that an eavesdropper with a powerful computer --- who sees the responses to random input --- still cannot infer responses to new inputs? To address such questions, we propose a rigorous model of human computation and associated measures of complexity. We apply the model and measures first and foremost to the problem of (1) humanly computable password generation, and then consider related problems of (2) humanly computable "one-way functions" and (3) humanly computable "pseudorandom generators". The theory of Human Computability developed here plays by different rules than standard computability, and this takes some getting used to. For reasons to be made clear, the polynomial versus exponential time divide of modern computability theory is irrelevant to human computation. In human computability, the step-counts for both humans and computers must be more concrete. Specifically, we restrict the adversary to at most 10^24 (Avogadro number of) steps. An alternate view of this work is that it deals with the analysis of algorithms and counting steps for the case that inputs are small as opposed to the usual case of inputs large-in-the-limit.

LGAug 12, 2016

Chi-squared Amplification: Identifying Hidden Hubs

Ravi Kannan, Santosh Vempala

We consider the following general hidden hubs model: an $n \times n$ random matrix $A$ with a subset $S$ of $k$ special rows (hubs): entries in rows outside $S$ are generated from the probability distribution $p_0 \sim N(0,σ_0^2)$; for each row in $S$, some $k$ of its entries are generated from $p_1 \sim N(0,σ_1^2)$, $σ_1>σ_0$, and the rest of the entries from $p_0$. The problem is to identify the high-degree hubs efficiently. This model includes and significantly generalizes the planted Gaussian Submatrix Model, where the special entries are all in a $k \times k$ submatrix. There are two well-known barriers: if $k\geq c\sqrt{n\ln n}$, just the row sums are sufficient to find $S$ in the general model. For the submatrix problem, this can be improved by a $\sqrt{\ln n}$ factor to $k \ge c\sqrt{n}$ by spectral methods or combinatorial methods. In the variant with $p_0=\pm 1$ (with probability $1/2$ each) and $p_1\equiv 1$, neither barrier has been broken. We give a polynomial-time algorithm to identify all the hidden hubs with high probability for $k \ge n^{0.5-δ}$ for some $δ>0$, when $σ_1^2>2σ_0^2$. The algorithm extends to the setting where planted entries might have different variances each at least as large as $σ_1^2$. We also show a nearly matching lower bound: for $σ_1^2 \le 2σ_0^2$, there is no polynomial-time Statistical Query algorithm for distinguishing between a matrix whose entries are all from $N(0,σ_0^2)$ and a matrix with $k=n^{0.5-δ}$ hidden hubs for any $δ>0$. The lower bound as well as the algorithm are related to whether the chi-squared distance of the two distributions diverges. At the critical value $σ_1^2=2σ_0^2$, we show that the general hidden hubs problem can be solved for $k\geq c\sqrt n(\ln n)^{1/4}$, improving on the naive row sum-based method.

DSApr 24, 2016

Agnostic Estimation of Mean and Covariance

Kevin A. Lai, Anup B. Rao, Santosh Vempala

We consider the problem of estimating the mean and covariance of a distribution from iid samples in $\mathbb{R}^n$, in the presence of an $η$ fraction of malicious noise; this is in contrast to much recent work where the noise itself is assumed to be from a distribution of known type. The agnostic problem includes many interesting special cases, e.g., learning the parameters of a single Gaussian (or finding the best-fit Gaussian) when $η$ fraction of data is adversarially corrupted, agnostically learning a mixture of Gaussians, agnostic ICA, etc. We present polynomial-time algorithms to estimate the mean and covariance with error guarantees in terms of information-theoretic lower bounds. As a corollary, we also obtain an agnostic algorithm for Singular Value Decomposition.

NEFeb 26, 2016

Cortical Computation via Iterative Constructions

Christos Papadimitrou, Samantha Petti, Santosh Vempala

We study Boolean functions of an arbitrary number of input variables that can be realized by simple iterative constructions based on constant-size primitives. This restricted type of construction needs little global coordination or control and thus is a candidate for neurally feasible computation. Valiant's construction of a majority function can be realized in this manner and, as we show, can be generalized to any uniform threshold function. We study the rate of convergence, finding that while linear convergence to the correct function can be achieved for any threshold using a fixed set of primitives, for quadratic convergence, the size of the primitives must grow as the threshold approaches 0 or 1. We also study finite realizations of this process and the learnability of the functions realized. We show that the constructions realized are accurate outside a small interval near the target threshold, where the size of the construction grows as the inverse square of the interval width. This phenomenon, that errors are higher closer to thresholds (and thresholds closer to the boundary are harder to represent), is a well-known cognitive finding.

LGDec 30, 2015

Statistical Query Algorithms for Mean Vector Estimation and Stochastic Convex Optimization

Vitaly Feldman, Cristobal Guzman, Santosh Vempala

Stochastic convex optimization, where the objective is the expectation of a random convex function, is an important and widely used method with numerous applications in machine learning, statistics, operations research and other areas. We study the complexity of stochastic convex optimization given only statistical query (SQ) access to the objective function. We show that well-known and popular first-order iterative methods can be implemented using only statistical queries. For many cases of interest we derive nearly matching upper and lower bounds on the estimation (sample) complexity including linear optimization in the most general setting. We then present several consequences for machine learning, differential privacy and proving concrete lower bounds on the power of convex optimization based methods. The key ingredient of our work is SQ algorithms and lower bounds for estimating the mean vector of a distribution over vectors supported on a convex body in $\mathbb{R}^d$. This natural problem has not been previously studied and we show that our solutions can be used to get substantially improved SQ versions of Perceptron and other online algorithms for learning halfspaces.

LGNov 6, 2014

Efficient Representations for Life-Long Learning and Autoencoding

Maria-Florina Balcan, Avrim Blum, Santosh Vempala

It has been a long-standing goal in machine learning, as well as in AI more generally, to develop life-long learning systems that learn many different tasks over time, and reuse insights from tasks learned, "learning to learn" as they do so. In this work we pose and provide efficient algorithms for several natural theoretical formulations of this goal. Specifically, we consider the problem of learning many different target functions over time, that share certain commonalities that are initially unknown to the learning algorithm. Our aim is to learn new internal representations as the algorithm learns new target functions, that capture this commonality and allow subsequent learning tasks to be solved more efficiently and from less data. We develop efficient algorithms for two very different kinds of commonalities that target functions might share: one based on learning common low-dimensional and unions of low-dimensional subspaces and one based on learning nonlinear Boolean combinations of features. Our algorithms for learning Boolean feature combinations additionally have a dual interpretation, and can be viewed as giving an efficient procedure for constructing near-optimal sparse Boolean autoencoders under a natural "anchor-set" assumption.

CRMar 31, 2014

Towards Human Computable Passwords

Jeremiah Blocki, Manuel Blum, Anupam Datta et al.

An interesting challenge for the cryptography community is to design authentication protocols that are so simple that a human can execute them without relying on a fully trusted computer. We propose several candidate authentication protocols for a setting in which the human user can only receive assistance from a semi-trusted computer --- a computer that stores information and performs computations correctly but does not provide confidentiality. Our schemes use a semi-trusted computer to store and display public challenges $C_i\in[n]^k$. The human user memorizes a random secret mapping $σ:[n]\rightarrow\mathbb{Z}_d$ and authenticates by computing responses $f(σ(C_i))$ to a sequence of public challenges where $f:\mathbb{Z}_d^k\rightarrow\mathbb{Z}_d$ is a function that is easy for the human to evaluate. We prove that any statistical adversary needs to sample $m=\tildeΩ(n^{s(f)})$ challenge-response pairs to recover $σ$, for a security parameter $s(f)$ that depends on two key properties of $f$. To obtain our results, we apply the general hypercontractivity theorem to lower bound the statistical dimension of the distribution over challenge-response pairs induced by $f$ and $σ$. Our lower bounds apply to arbitrary functions $f $ (not just to functions that are easy for a human to evaluate), and generalize recent results of Feldman et al. As an application, we propose a family of human computable password functions $f_{k_1,k_2}$ in which the user needs to perform $2k_1+2k_2+1$ primitive operations (e.g., adding two digits or remembering $σ(i)$), and we show that $s(f) = \min\{k_1+1, (k_2+1)/2\}$. For these schemes, we prove that forging passwords is equivalent to recovering the secret mapping. Thus, our human computable password schemes can maintain strong security guarantees even after an adversary has observed the user login to many different accounts.

LGJun 25, 2013

Fourier PCA and Robust Tensor Decomposition

Navin Goyal, Santosh Vempala, Ying Xiao

Fourier PCA is Principal Component Analysis of a matrix obtained from higher order derivatives of the logarithm of the Fourier transform of a distribution.We make this method algorithmic by developing a tensor decomposition method for a pair of tensors sharing the same vectors in rank-$1$ decompositions. Our main application is the first provably polynomial-time algorithm for underdetermined ICA, i.e., learning an $n \times m$ matrix $A$ from observations $y=Ax$ where $x$ is drawn from an unknown product distribution with arbitrary non-Gaussian components. The number of component distributions $m$ can be arbitrarily higher than the dimension $n$ and the columns of $A$ only need to satisfy a natural and efficiently verifiable nondegeneracy condition. As a second application, we give an alternative algorithm for learning mixtures of spherical Gaussians with linearly independent means. These results also hold in the presence of Gaussian noise.