MLJun 10, 2018
Building Bayesian Neural Networks with Blocks: On Structure, Interpretability and UncertaintyHao Henry Zhou, Yunyang Xiong, Vikas Singh
We provide simple schemes to build Bayesian Neural Networks (BNNs), block by block, inspired by a recent idea of computation skeletons. We show how by adjusting the types of blocks that are used within the computation skeleton, we can identify interesting relationships with Deep Gaussian Processes (DGPs), deep kernel learning (DKL), random features type approximation and other topics. We give strategies to approximate the posterior via doubly stochastic variational inference for such models which yield uncertainty estimates. We give a detailed theoretical analysis and point out extensions that may be of independent interest. As a special case, we instantiate our procedure to define a Bayesian {\em additive} Neural network -- a promising strategy to identify statistical interactions and has direct benefits for obtaining interpretable models.
MLJan 23, 2018
Non-parametric Sparse Additive Auto-regressive Network ModelsHao Henry Zhou, Garvesh Raskutti
Consider a multi-variate time series $(X_t)_{t=0}^{T}$ where $X_t \in \mathbb{R}^d$ which may represent spike train responses for multiple neurons in a brain, crime event data across multiple regions, and many others. An important challenge associated with these time series models is to estimate an influence network between the $d$ variables, especially when the number of variables $d$ is large meaning we are in the high-dimensional setting. Prior work has focused on parametric vector auto-regressive models. However, parametric approaches are somewhat restrictive in practice. In this paper, we use the non-parametric sparse additive model (SpAM) framework to address this challenge. Using a combination of $β$ and $φ$-mixing properties of Markov chains and empirical process techniques for reproducing kernel Hilbert spaces (RKHSs), we provide upper bounds on mean-squared error in terms of the sparsity $s$, logarithm of the dimension $\log d$, number of time points $T$, and the smoothness of the RKHSs. Our rates are sharp up to logarithm factors in many cases. We also provide numerical experiments that support our theoretical results and display potential advantages of using our non-parametric SpAM framework for a Chicago crime dataset.
MESep 2, 2017
When can Multi-Site Datasets be Pooled for Regression? Hypothesis Tests, $\ell_2$-consistency and Neuroscience ApplicationsHao Henry Zhou, Yilin Zhang, Vamsi K. Ithapu et al.
Many studies in biomedical and health sciences involve small sample sizes due to logistic or financial constraints. Often, identifying weak (but scientifically interesting) associations between a set of predictors and a response necessitates pooling datasets from multiple diverse labs or groups. While there is a rich literature in statistical machine learning to address distributional shifts and inference in multi-site datasets, it is less clear ${\it when}$ such pooling is guaranteed to help (and when it does not) -- independent of the inference algorithms we use. In this paper, we present a hypothesis test to answer this question, both for classical and high dimensional linear regression. We precisely identify regimes where pooling datasets across multiple sites is sensible, and how such policy decisions can be made via simple checks executable on each site before any data transfer ever happens. With a focus on Alzheimer's disease studies, we present empirical results showing that in regimes suggested by our analysis, pooling a local dataset with data from an international study improves power.