MEAPCOMLApr 11, 2021

Parallel integrative learning for large-scale multi-response regression with incomplete outcomes

arXiv:2104.05076v1
Originality Incremental advance
AI Analysis

This work addresses computational and statistical challenges in multi-task learning for big data applications, representing an incremental improvement in handling high-dimensional predictors and responses with incomplete data.

The authors tackled the problem of large-scale multi-response regression with incomplete outcomes by proposing a scalable method called PEER, which converts the problem into parallel univariate regressions and demonstrates favorable performance in estimation accuracy, variable selection, and computational efficiency compared to existing methods.

Multi-task learning is increasingly used to investigate the association structure between multiple responses and a single set of predictor variables in many applications. In the era of big data, the coexistence of incomplete outcomes, large number of responses, and high dimensionality in predictors poses unprecedented challenges in estimation, prediction, and computation. In this paper, we propose a scalable and computationally efficient procedure, called PEER, for large-scale multi-response regression with incomplete outcomes, where both the numbers of responses and predictors can be high-dimensional. Motivated by sparse factor regression, we convert the multi-response regression into a set of univariate-response regressions, which can be efficiently implemented in parallel. Under some mild regularity conditions, we show that PEER enjoys nice sampling properties including consistency in estimation, prediction, and variable selection. Extensive simulation studies show that our proposal compares favorably with several existing methods in estimation accuracy, variable selection, and computation efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes