MLDCCOMEOct 24, 2014

Median Selection Subset Aggregation for Parallel Inference

arXiv:1410.6604v126 citations
Originality Incremental advance
AI Analysis

This addresses the problem of efficient parallel inference for massive datasets with many features, offering a practical solution with theoretical backing, though it is incremental as it builds on existing distributed methods.

The paper tackles the challenge of developing a distributed algorithm for regression and classification with many features that achieves low communication, theoretical guarantees, and excellent practical performance by proposing the MEdian Selection Subset AGgregation Estimator (message) algorithm, which demonstrates strong results in variable selection, estimation, prediction, and computation time compared to competitors.

For massive data sets, efficient computation commonly relies on distributed algorithms that store and process subsets of the data on different machines, minimizing communication costs. Our focus is on regression and classification problems involving many features. A variety of distributed algorithms have been proposed in this context, but challenges arise in defining an algorithm with low communication, theoretical guarantees and excellent practical performance in general settings. We propose a MEdian Selection Subset AGgregation Estimator (message) algorithm, which attempts to solve these problems. The algorithm applies feature selection in parallel for each subset using Lasso or another method, calculates the `median' feature inclusion index, estimates coefficients for the selected features in parallel for each subset, and then averages these estimates. The algorithm is simple, involves very minimal communication, scales efficiently in both sample and feature size, and has theoretical guarantees. In particular, we show model selection consistency and coefficient estimation efficiency. Extensive experiments show excellent performance in variable selection, estimation, prediction, and computation time relative to usual competitors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes