Partitioning Data on Features or Samples in Communication-Efficient Distributed Optimization?
This addresses communication efficiency in distributed optimization, which is important for large-scale machine learning applications, but appears incremental as it modifies an existing algorithm.
The paper investigates how data partitioning affects distributed optimization, comparing sample-based partitioning (as in DiSCO) with feature-based partitioning, and demonstrates that the modified algorithm for feature partitioning is efficient both theoretically and practically.
In this paper we study the effect of the way that the data is partitioned in distributed optimization. The original DiSCO algorithm [Communication-Efficient Distributed Optimization of Self-Concordant Empirical Loss, Yuchen Zhang and Lin Xiao, 2015] partitions the input data based on samples. We describe how the original algorithm has to be modified to allow partitioning on features and show its efficiency both in theory and also in practice.