Sayan Mukherjee

h-index59

9papers

28citations

Novelty39%

AI Score25

Ranked #166,601 of 194,257 authors (top 86%)#2,874 in ML (top 85%)

9 Papers

4.3MLJun 6, 2023

Asymptotics of Bayesian Uncertainty Estimation in Random Features Regression

Youngsoo Baek, Samuel I. Berchuck, Sayan Mukherjee

In this paper we compare and contrast the behavior of the posterior predictive distribution to the risk of the maximum a posteriori estimator for the random features regression model in the overparameterized regime. We will focus on the variance of the posterior predictive distribution (Bayesian model average) and compare its asymptotics to that of the risk of the MAP estimator. In the regime where the model dimensions grow faster than any constant multiple of the number of samples, asymptotic agreement between these two quantities is governed by the phase transition in the signal-to-noise ratio. They also asymptotically agree with each other when the number of samples grow faster than any constant multiple of model dimensions. Numerical simulations illustrate finer distributional properties of the two quantities for finite dimensions. We conjecture they have Gaussian fluctuations and exhibit similar properties as found by previous authors in a Gaussian sequence model, which is of independent theoretical interest.

2.3APAug 24, 2019Code

Scalable Modeling of Spatiotemporal Data using the Variational Autoencoder: an Application in Glaucoma

Samuel I. Berchuck, Felipe A. Medeiros, Sayan Mukherjee

As big spatial data becomes increasingly prevalent, classical spatiotemporal (ST) methods often do not scale well. While methods have been developed to account for high-dimensional spatial objects, the setting where there are exceedingly large samples of spatial observations has had less attention. The variational autoencoder (VAE), an unsupervised generative model based on deep learning and approximate Bayesian inference, fills this void using a latent variable specification that is inferred jointly across the large number of samples. In this manuscript, we compare the performance of the VAE with a more classical ST method when analyzing longitudinal visual fields from a large cohort of patients in a prospective glaucoma study. Through simulation and a case study, we demonstrate that the VAE is a scalable method for analyzing ST data, when the goal is to obtain accurate predictions. R code to implement the VAE can be found on GitHub: https://github.com/berchuck/vaeST.

1.2ARAug 2, 2021

Accelerating Markov Random Field Inference with Uncertainty Quantification

Ramin Bashizade, Xiangyu Zhang, Sayan Mukherjee et al.

Statistical machine learning has widespread application in various domains. These methods include probabilistic algorithms, such as Markov Chain Monte-Carlo (MCMC), which rely on generating random numbers from probability distributions. These algorithms are computationally expensive on conventional processors, yet their statistical properties, namely interpretability and uncertainty quantification (UQ) compared to deep learning, make them an attractive alternative approach. Therefore, hardware specialization can be adopted to address the shortcomings of conventional processors in running these applications. In this paper, we propose a high-throughput accelerator for Markov Random Field (MRF) inference, a powerful model for representing a wide range of applications, using MCMC with Gibbs sampling. We propose a tiled architecture which takes advantage of near-memory computing, and memory optimizations tailored to the semantics of MRF. Additionally, we propose a novel hybrid on-chip/off-chip memory system and logging scheme to efficiently support UQ. This memory system design is not specific to MRF models and is applicable to applications using probabilistic algorithms. In addition, it dramatically reduces off-chip memory bandwidth requirements. We implemented an FPGA prototype of our proposed architecture using high-level synthesis tools and achieved 146MHz frequency for an accelerator with 32 function units on an Intel Arria 10 FPGA. Compared to prior work on FPGA, our accelerator achieves 26X speedup. Furthermore, our proposed memory system and logging scheme to support UQ reduces off-chip bandwidth by 71% for two applications. ASIC analysis in 15nm shows our design with 2048 function units running at 3GHz outperforms GPU implementations of motion estimation and stereo vision on Nvidia RTX2080Ti by 120X-210X, occupying only 7.7% of the area.

2.3SDMay 31, 2021Code

A Methodology for Exploring Deep Convolutional Features in Relation to Hand-Crafted Features with an Application to Music Audio Modeling

Anna K. Yanchenko, Mohammadreza Soltani, Robert J. Ravier et al.

Understanding the features learned by deep models is important from a model trust perspective, especially as deep systems are deployed in the real world. Most recent approaches for deep feature understanding or model explanation focus on highlighting input data features that are relevant for classification decisions. In this work, we instead take the perspective of relating deep features to well-studied, hand-crafted features that are meaningful for the application of interest. We propose a methodology and set of systematic experiments for exploring deep features in this setting, where input feature importance approaches for deep feature understanding do not apply. Our experiments focus on understanding which hand-crafted and deep features are useful for the classification task of interest, how robust these features are for related tasks and how similar the deep features are to the meaningful hand-crafted features. Our proposed method is general to many application areas and we demonstrate its utility on orchestral music audio data.

1.4MLDec 14, 2020

At the Intersection of Deep Sequential Model Framework and State-space Model Framework: Study on Option Pricing

Ziyang Ding, Sayan Mukherjee

Inference and forecast problems of the nonlinear dynamical system have arisen in a variety of contexts. Reservoir computing and deep sequential models, on the one hand, have demonstrated efficient, robust, and superior performance in modeling simple and chaotic dynamical systems. However, their innate deterministic feature has partially detracted their robustness to noisy system, and their inability to offer uncertainty measurement has also been an insufficiency of the framework. On the other hand, the traditional state-space model framework is robust to noise. It also carries measured uncertainty, forming a just-right complement to the reservoir computing and deep sequential model framework. We propose the unscented reservoir smoother, a model that unifies both deep sequential and state-space models to achieve both frameworks' superiorities. Evaluated in the option pricing setting on top of noisy datasets, URS strikes highly competitive forecasting accuracy, especially those of longer-term, and uncertainty measurement. Further extensions and implications on URS are also discussed to generalize a full integration of both frameworks.

1.9MLNov 15, 2018

Subspace Clustering through Sub-Clusters

Weiwei Li, Jan Hannig, Sayan Mukherjee

The problem of dimension reduction is of increasing importance in modern data analysis. In this paper, we consider modeling the collection of points in a high dimensional space as a union of low dimensional subspaces. In particular we propose a highly scalable sampling based algorithm that clusters the entire data via first spectral clustering of a small random sample followed by classifying or labeling the remaining out of sample points. The key idea is that this random subset borrows information across the entire data set and that the problem of clustering points can be replaced with the more efficient and robust problem of "clustering sub-clusters". We provide theoretical guarantees for our procedure. The numerical results indicate we outperform other state-of-the-art subspace clustering algorithms with respect to accuracy and speed.

3.5MLFeb 21, 2018Code

Learning Integral Representations of Gaussian Processes

Zilong Tan, Sayan Mukherjee

We propose a representation of Gaussian processes (GPs) based on powers of the integral operator defined by a kernel function, we call these stochastic processes integral Gaussian processes (IGPs). Sample paths from IGPs are functions contained within the reproducing kernel Hilbert space (RKHS) defined by the kernel function, in contrast sample paths from the standard GP are not functions within the RKHS. We develop computationally efficient non-parametric regression models based on IGPs. The main innovation in our regression algorithm is the construction of a low dimensional subspace that captures the information most relevant to explaining variation in the response. We use ideas from supervised dimension reduction to compute this subspace. The result of using the construction we propose involves significant improvements in the computational complexity of estimating kernel hyper-parameters as well as reducing the prediction variance.

3.3STMar 17, 2016

Fast moment estimation for generalized latent Dirichlet models

Shiwen Zhao, Barbara E. Engelhardt, Sayan Mukherjee et al.

We develop a generalized method of moments (GMM) approach for fast parameter estimation in a new class of Dirichlet latent variable models with mixed data types. Parameter estimation via GMM has been demonstrated to have computational and statistical advantages over alternative methods, such as expectation maximization, variational inference, and Markov chain Monte Carlo. The key computational advan- tage of our method (MELD) is that parameter estimation does not require instantiation of the latent variables. Moreover, a representational advantage of the GMM approach is that the behavior of the model is agnostic to distributional assumptions of the observations. We derive population moment conditions after marginalizing out the sample-specific Dirichlet latent variables. The moment conditions only depend on component mean parameters. We illustrate the utility of our approach on simulated data, comparing results from MELD to alternative methods, and we show the promise of our approach through the application of MELD to several data sets.

6.1MLApr 13, 2015

Adaptive Randomized Dimension Reduction on Massive Data

Gregory Darnell, Stoyan Georgiev, Sayan Mukherjee et al.

The scalability of statistical estimators is of increasing importance in modern applications. One approach to implementing scalable algorithms is to compress data into a low dimensional latent space using dimension reduction methods. In this paper we develop an approach for dimension reduction that exploits the assumption of low rank structure in high dimensional data to gain both computational and statistical advantages. We adapt recent randomized low-rank approximation algorithms to provide an efficient solution to principal component analysis (PCA), and we use this efficient solver to improve parameter estimation in large-scale linear mixed models (LMM) for association mapping in statistical and quantitative genomics. A key observation in this paper is that randomization serves a dual role, improving both computational and statistical performance by implicitly regularizing the covariance matrix estimate of the random effect in a LMM. These statistical and computational advantages are highlighted in our experiments on simulated data and large-scale genomic studies.