MLDCLGJun 8, 2015

DUAL-LOCO: Distributing Statistical Estimation Using Random Projections

arXiv:1506.02554v241 citations
Originality Highly original
AI Analysis

This addresses communication bottlenecks in distributed machine learning for scenarios with feature-distributed data, offering a practical solution with demonstrated improvements.

The paper tackles the problem of communication-efficient distributed statistical estimation by proposing DUAL-LOCO, which uses random projections in a single communication round to approximate feature dependencies, resulting in bounded error and better speedups while maintaining accuracy compared to a state-of-the-art method on real-world datasets.

We present DUAL-LOCO, a communication-efficient algorithm for distributed statistical estimation. DUAL-LOCO assumes that the data is distributed according to the features rather than the samples. It requires only a single round of communication where low-dimensional random projections are used to approximate the dependences between features available to different workers. We show that DUAL-LOCO has bounded approximation error which only depends weakly on the number of workers. We compare DUAL-LOCO against a state-of-the-art distributed optimization method on a variety of real world datasets and show that it obtains better speedups while retaining good accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes