ME MLDec 26, 2019

Meta-analysis of heterogeneous data: integrative sparse regression in high-dimensions

Subha Maity, Yuekai Sun, Moulinath Banerjee

arXiv:1912.11928v25.910 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the problem of integrating non-identical data sources for researchers in statistics and bioinformatics, offering an incremental improvement in meta-analysis methods for high-dimensional settings.

The paper tackles meta-analysis of heterogeneous high-dimensional datasets by introducing a global parameter for interpretability and efficiency, and a one-shot estimator that preserves data anonymity and converges based on combined dataset size. It demonstrates superiority in adapting to seen and predicting unseen data distributions in high-dimensional linear models, and shows benefits on a large-scale drug treatment dataset across cancer cell-lines.

We consider the task of meta-analysis in high-dimensional settings in which the data sources are similar but non-identical. To borrow strength across such heterogeneous datasets, we introduce a global parameter that emphasizes interpretability and statistical efficiency in the presence of heterogeneity. We also propose a one-shot estimator of the global parameter that preserves the anonymity of the data sources and converges at a rate that depends on the size of the combined dataset. For high-dimensional linear model settings, we demonstrate the superiority of our identification restrictions in adapting to a previously seen data distribution as well as predicting for a new/unseen data distribution. Finally, we demonstrate the benefits of our approach on a large-scale drug treatment dataset involving several different cancer cell-lines.

View on arXiv PDF Code

Similar