OCLGOct 22, 2015

Partitioning Data on Features or Samples in Communication-Efficient Distributed Optimization?

arXiv:1510.06688v14 citations
Originality Synthesis-oriented
AI Analysis

This addresses communication efficiency in distributed optimization, which is important for large-scale machine learning applications, but appears incremental as it modifies an existing algorithm.

The paper investigates how data partitioning affects distributed optimization, comparing sample-based partitioning (as in DiSCO) with feature-based partitioning, and demonstrates that the modified algorithm for feature partitioning is efficient both theoretically and practically.

In this paper we study the effect of the way that the data is partitioned in distributed optimization. The original DiSCO algorithm [Communication-Efficient Distributed Optimization of Self-Concordant Empirical Loss, Yuchen Zhang and Lin Xiao, 2015] partitions the input data based on samples. We describe how the original algorithm has to be modified to allow partitioning on features and show its efficiency both in theory and also in practice.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes