MELGMLJul 13, 2021

For high-dimensional hierarchical models, consider exchangeability of effects across covariates instead of across datasets

arXiv:2107.06428v15 citations
Originality Incremental advance
AI Analysis

This addresses a bottleneck in high-dimensional data analysis, such as statistical genetics, by improving model performance when covariates outnumber datasets, though it is incremental as it modifies an existing hierarchical framework.

The paper tackles poor statistical performance in hierarchical Bayesian models when the number of covariates exceeds the number of datasets, proposing a model where effects are exchangeable across covariates instead of datasets. It shows this method outperforms the classic approach in high-dimensional settings, with empirical validation on multiple regression and classification problems.

Hierarchical Bayesian methods enable information sharing across multiple related regression problems. While standard practice is to model regression parameters (effects) as (1) exchangeable across datasets and (2) correlated to differing degrees across covariates, we show that this approach exhibits poor statistical performance when the number of covariates exceeds the number of datasets. For instance, in statistical genetics, we might regress dozens of traits (defining datasets) for thousands of individuals (responses) on up to millions of genetic variants (covariates). When an analyst has more covariates than datasets, we argue that it is often more natural to instead model effects as (1) exchangeable across covariates and (2) correlated to differing degrees across datasets. To this end, we propose a hierarchical model expressing our alternative perspective. We devise an empirical Bayes estimator for learning the degree of correlation between datasets. We develop theory that demonstrates that our method outperforms the classic approach when the number of covariates dominates the number of datasets, and corroborate this result empirically on several high-dimensional multiple regression and classification problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes