Kaushik Sinha

LG
5papers
190citations
Novelty47%
AI Score39

5 Papers

STFeb 5
Optimal rates for density and mode estimation with expand-and-sparsify representations

Kaushik Sinha, Christopher Tosh

Expand-and-sparsify representations are a class of theoretical models that capture sparse representation phenomena observed in the sensory systems of many animals. At a high level, these representations map an input $x \in \mathbb{R}^d$ to a much higher dimension $m \gg d$ via random linear projections before zeroing out all but the $k \ll m$ largest entries. The result is a $k$-sparse vector in $\{0,1\}^m$. We study the suitability of this representation for two fundamental statistical problems: density estimation and mode estimation. For density estimation, we show that a simple linear function of the expand-and-sparsify representation produces an estimator with minimax-optimal $\ell_{\infty}$ convergence rates. In mode estimation, we provide simple algorithms on top of our density estimator that recover single or multiple modes at optimal rates up to logarithmic factors under mild conditions.

LGDec 14, 2021
Federated Nearest Neighbor Classification with a Colony of Fruit-Flies: With Supplement

Parikshit Ram, Kaushik Sinha

The mathematical formalization of a neurological mechanism in the olfactory circuit of a fruit-fly as a locality sensitive hash (Flyhash) and bloom filter (FBF) has been recently proposed and "reprogrammed" for various machine learning tasks such as similarity search, outlier detection and text embeddings. We propose a novel reprogramming of this hash and bloom filter to emulate the canonical nearest neighbor classifier (NNC) in the challenging Federated Learning (FL) setup where training and test data are spread across parties and no data can leave their respective parties. Specifically, we utilize Flyhash and FBF to create the FlyNN classifier, and theoretically establish conditions where FlyNN matches NNC. We show how FlyNN is trained exactly in a FL setup with low communication overhead to produce FlyNNFL, and how it can be differentially private. Empirically, we demonstrate that (i) FlyNN matches NNC accuracy across 70 OpenML datasets, (ii) FlyNNFL training is highly scalable with low communication overhead, providing up to $8\times$ speedup with $16$ parties.

LGAug 19, 2020
Neural Neighborhood Encoding for Classification

Kaushik Sinha, Parikshit Ram

Inspired by the fruit-fly olfactory circuit, the Fly Bloom Filter [Dasgupta et al., 2018] is able to efficiently summarize the data with a single pass and has been used for novelty detection. We propose a new classifier (for binary and multi-class classification) that effectively encodes the different local neighborhoods for each class with a per-class Fly Bloom Filter. The inference on test data requires an efficient {\tt FlyHash} [Dasgupta, et al., 2017] operation followed by a high-dimensional, but {\em sparse}, dot product with the per-class Bloom Filters. The learning is trivially parallelizable. On the theoretical side, we establish conditions under which the prediction of our proposed classifier on any test example agrees with the prediction of the nearest neighbor classifier with high probability. We extensively evaluate our proposed scheme with over $50$ data sets of varied data dimensionality to demonstrate that the predictive performance of our proposed neuroscience inspired classifier is competitive the the nearest-neighbor classifiers and other single-pass classifiers.

HCFeb 15, 2018
IBeaconMap: Automated Indoor Space Representation for Beacon-Based Wayfinding

Seyed Ali Cheraghi, Vinod Namboodiri, Kaushik Sinha

Traditionally, there have been few options for navigational aids for the blind and visually impaired (BVI) in large indoor spaces. Some recent indoor navigation systems allow users equipped with smartphones to interact with low cost Bluetoothbased beacons deployed strategically within the indoor space of interest to navigate their surroundings. A major challenge in deploying such beacon-based navigation systems is the need to employ a time and labor-expensive beacon planning process to identify potential beacon placement locations and arrive at a topological structure representing the indoor space. This work presents a technique called IBeaconMap for creating such topological structures to use with beacon-based navigation that only needs the floor plans of the indoor spaces of interest. IBeaconMap employs a combination of computer vision and machine learning techniques to arrive at the required set of beacon locations and a weighted connectivity graph (with directional orientations) for subsequent navigational needs. Evaluations show IBeaconMap to be both fast and reasonably accurate, potentially proving to be an essential tool to be utilized before mass deployments of beacon-based indoor wayfinding systems of the future.

MLJul 12, 2012
Near-Optimal Algorithms for Differentially-Private Principal Components

Kamalika Chaudhuri, Anand D. Sarwate, Kaushik Sinha

Principal components analysis (PCA) is a standard tool for identifying good low-dimensional approximations to data in high dimension. Many data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We show that the sample complexity of the proposed method differs from the existing procedure in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling. We furthermore illustrate our results, showing that on real data there is a large performance gap between the existing method and our method.