Ravindran Kannan

6papers

69citations

Novelty64%

AI Score43

Ranked #78,071 of 201,326 authors (top 39%)#269 in DS (top 48%)

6 Papers

DSApr 8, 2010

Spectral Methods for Matrices and Tensors

Ravindran Kannan

While Spectral Methods have long been used for Principal Component Analysis, this survey focusses on work over the last 15 years with three salient features: (i) Spectral methods are useful not only for numerical problems, but also discrete optimization problems (Constraint Optimization Problems - CSP's) like the max. cut problem and similar mathematical considerations underlie both areas. (ii) Spectral methods can be extended to tensors. The theory and algorithms for tensors are not as simple/clean as for matrices, but the survey describes methods for low-rank approximation which extend to tensors. These tensor approximations help us solve Max-$r$-CSP's for $r>2$ as well as numerical tensor problems. (iii) Sampling on the fly plays a prominent role in these methods. A primary result is that for any matrix, a random submatrix of rows/columns picked with probabilities proportional to the squared lengths (of rows/columns), yields estimates of the singular values as well as an approximation to the whole matrix.

LGJul 21, 2023

Random Separating Hyperplane Theorem and Learning Polytopes

Chiranjib Bhattacharyya, Ravindran Kannan, Amit Kumar

The Separating Hyperplane theorem is a fundamental result in Convex Geometry with myriad applications. Our first result, Random Separating Hyperplane Theorem (RSH), is a strengthening of this for polytopes. $\rsh$ asserts that if the distance between $a$ and a polytope $K$ with $k$ vertices and unit diameter in $\Re^d$ is at least $δ$, where $δ$ is a fixed constant in $(0,1)$, then a randomly chosen hyperplane separates $a$ and $K$ with probability at least $1/poly(k)$ and margin at least $Ω\left(δ/\sqrt{d} \right)$. An immediate consequence of our result is the first near optimal bound on the error increase in the reduction from a Separation oracle to an Optimization oracle over a polytope. RSH has algorithmic applications in learning polytopes. We consider a fundamental problem, denoted the ``Hausdorff problem'', of learning a unit diameter polytope $K$ within Hausdorff distance $δ$, given an optimization oracle for $K$. Using RSH, we show that with polynomially many random queries to the optimization oracle, $K$ can be approximated within error $O(δ)$. To our knowledge this is the first provable algorithm for the Hausdorff Problem. Building on this result, we show that if the vertices of $K$ are well-separated, then an optimization oracle can be used to generate a list of points, each within Hausdorff distance $O(δ)$ of $K$, with the property that the list contains a point close to each vertex of $K$. Further, we show how to prune this list to generate a (unique) approximation to each vertex of the polytope. We prove that in many latent variable settings, e.g., topic modeling, LDA, optimization oracles do exist provided we project to a suitable SVD subspace. Thus, our work yields the first efficient algorithm for finding approximations to the vertices of the latent polytope under the well-separatedness assumption.

DSApr 4

SVD Provably Denoises Nearest Neighbor Data

Ravindran Kannan, Kijun Shin, David Woodruff

We study the Nearest Neighbor Search (NNS) problem in a high-dimensional setting where data lies in a low-dimensional subspace and is corrupted by Gaussian noise. Specifically, we consider a semi-random model in which $n$ points from an unknown $k$-dimensional subspace of $\mathbb{R}^d$ ($k \ll d$) are perturbed by zero-mean $d$-dimensional Gaussian noise with variance $Ï^2$ per coordinate. Assuming the second-nearest neighbor is at least a factor $(1+\varepsilon)$ farther from the query than the nearest neighbor, and given only the noisy data, our goal is to recover the nearest neighbor in the uncorrupted data. We prove three results. First, for $Ï\in O(1/k^{1/4})$, simply performing SVD denoises the data and provably recovers the correct nearest neighbor of the uncorrupted data. Second, for $Ï\gg 1/k^{1/4}$, the nearest neighbor in the uncorrupted data is not even identifiable from the noisy data in general, giving a matching lower bound and showing the necessity of this threshold for NNS. Third, for $Ï\gg 1/\sqrt{k}$, the noise magnitude $Ï\sqrt d$ significantly exceeds inter-point distances in the unperturbed data, and the nearest neighbor in the noisy data generally differs from that in the uncorrupted data. Thus, the first and third results together imply that SVD can identify the correct nearest neighbor even in regimes where naive nearest neighbor search on the noisy data fails. Compared to \citep{abdullah2014spectral}, our result does not require $Ï$ to be at least an inverse polynomial in the ambient dimension $d$. Our analysis uses perturbation bounds for singular spaces together with Gaussian concentration and spherical symmetry. We also provide empirical results on real datasets supporting our theory.

DSDec 8, 2020

Algorithms for finding $k$ in $k$-means

Chiranjib Bhattacharyya, Ravindran Kannan, Amit Kumar

$k-$means Clustering requires as input the exact value of $k$, the number of clusters. Two challenges are open: (i) Is there a data-determined definition of $k$ which is provably correct and (ii) Is there a polynomial time algorithm to find $k$ from data ? This paper provides the first affirmative answers to both these questions. As common in the literature, we assume that the data admits an unknown Ground Truth (GT) clustering with cluster centers separated. This assumption alone is not sufficient to answer Yes to (i). We assume a novel, but natural second constraint called no tight sub-cluster (NTSC) which stipulates that no substantially large subset of a GT cluster can be "tighter" (in a sense we define) than the cluster. Our yes answer to (i) and (ii) are under these two deterministic assumptions. We also give polynomial time algorithm to identify $k$. Our algorithm relies on NTSC to peel off one cluster at a time by identifying points which are tightly packed. We are also able to show that our algorithm(s) apply to data generated by mixtures of Gaussians and more generally to mixtures of sub-Gaussian pdf's and hence are able to find the number of components of the mixture from data. To our knowledge, previous results for these specialized settings as well, assume generally that $k$ is given besides the data.

LGApr 14, 2019

Finding a latent k-simplex in O(k . nnz(data)) time via Subset Smoothing

Chiranjib Bhattacharyya, Ravindran Kannan

In this paper we show that a large class of Latent variable models, such as Mixed Membership Stochastic Block(MMSB) Models, Topic Models, and Adversarial Clustering, can be unified through a geometric perspective, replacing model specific assumptions and algorithms for individual models. The geometric perspective leads to the formulation: \emph{find a latent $k-$ polytope $K$ in ${\bf R}^d$ given $n$ data points, each obtained by perturbing a latent point in $K$}. This problem does not seem to have been considered in the literature. The most important contribution of this paper is to show that the latent $k-$polytope problem admits an efficient algorithm under deterministic assumptions which naturally hold in Latent variable models considered in this paper. ur algorithm runs in time $O^*(k\; \mbox{nnz})$ matching the best running time of algorithms in special cases considered here and is better when the data is sparse, as is the case in applications. An important novelty of the algorithm is the introduction of \emph{subset smoothed polytope}, $K'$, the convex hull of the ${n\choose δn}$ points obtained by averaging all $δn$ subsets of the data points, for a given $δ\in (0,1)$. We show that $K$ and $K'$ are close in Hausdroff distance. Among the consequences of our algorithm are the following: (a) MMSB Models and Topic Models: the first quasi-input-sparsity time algorithm for parameter estimation for $k \in O^*(1)$, (b) Adversarial Clustering: In $k-$means, if, an adversary is allowed to move many data points from each cluster an arbitrary amount towards the convex hull of the centers of other clusters, our algorithm still estimates cluster centers well.

MLOct 26, 2014

A provable SVD-based algorithm for learning topics in dominant admixture corpus

Trapit Bansal, Chiranjib Bhattacharyya, Ravindran Kannan

Topic models, such as Latent Dirichlet Allocation (LDA), posit that documents are drawn from admixtures of distributions over words, known as topics. The inference problem of recovering topics from admixtures, is NP-hard. Assuming separability, a strong assumption, [4] gave the first provable algorithm for inference. For LDA model, [6] gave a provable algorithm using tensor-methods. But [4,6] do not learn topic vectors with bounded $l_1$ error (a natural measure for probability vectors). Our aim is to develop a model which makes intuitive and empirically supported assumptions and to design an algorithm with natural, simple components such as SVD, which provably solves the inference problem for the model with bounded $l_1$ error. A topic in LDA and other models is essentially characterized by a group of co-occurring words. Motivated by this, we introduce topic specific Catchwords, group of words which occur with strictly greater frequency in a topic than any other topic individually and are required to have high frequency together rather than individually. A major contribution of the paper is to show that under this more realistic assumption, which is empirically verified on real corpora, a singular value decomposition (SVD) based algorithm with a crucial pre-processing step of thresholding, can provably recover the topics from a collection of documents drawn from Dominant admixtures. Dominant admixtures are convex combination of distributions in which one distribution has a significantly higher contribution than others. Apart from the simplicity of the algorithm, the sample complexity has near optimal dependence on $w_0$, the lowest probability that a topic is dominant, and is better than [4]. Empirical evidence shows that on several real world corpora, both Catchwords and Dominant admixture assumptions hold and the proposed algorithm substantially outperforms the state of the art [5].