MLMay 22, 2017

Union of Intersections (UoI) for Interpretable Data Driven Discovery and Prediction

arXiv:1705.07585v219 citations
AI Analysis

This addresses the need for interpretable and predictive analysis in scientific data-driven discovery, offering a flexible framework that could benefit fields like biomedicine, though it appears incremental as it builds on existing model selection methods.

The paper tackles the challenge of developing interpretable and predictive statistical methods for large scientific data by introducing the Union of Intersections (UoI) framework, which achieves low-variance feature estimation and maintains high prediction accuracy, as demonstrated through applications like extracting functional networks from human electrophysiology and predicting phenotypes with reduced features.

The increasing size and complexity of scientific data could dramatically enhance discovery and prediction for basic scientific applications. Realizing this potential, however, requires novel statistical analysis methods that are both interpretable and predictive. We introduce Union of Intersections (UoI), a flexible, modular, and scalable framework for enhanced model selection and estimation. Methods based on UoI perform model selection and model estimation through intersection and union operations, respectively. We show that UoI-based methods achieve low-variance and nearly unbiased estimation of a small number of interpretable features, while maintaining high-quality prediction accuracy. We perform extensive numerical investigation to evaluate a UoI algorithm ($UoI_{Lasso}$) on synthetic and real data. In doing so, we demonstrate the extraction of interpretable functional networks from human electrophysiology recordings as well as accurate prediction of phenotypes from genotype-phenotype data with reduced features. We also show (with the $UoI_{L1Logistic}$ and $UoI_{CUR}$ variants of the basic framework) improved prediction parsimony for classification and matrix factorization on several benchmark biomedical data sets. These results suggest that methods based on the UoI framework could improve interpretation and prediction in data-driven discovery across scientific fields.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes