Randomized Polar Codes for Anytime Distributed Machine Learning
This addresses robustness in distributed machine learning for large-scale applications, though it appears incremental as it builds on existing coded computation concepts.
The paper tackles the problem of distributed computing with slow nodes by introducing a framework that integrates randomized sketching and polar codes for approximate and exact linear operations, demonstrating scalability up to ImageNet-scale computations.
We present a novel distributed computing framework that is robust to slow compute nodes, and is capable of both approximate and exact computation of linear operations. The proposed mechanism integrates the concepts of randomized sketching and polar codes in the context of coded computation. We propose a sequential decoding algorithm designed to handle real valued data while maintaining low computational complexity for recovery. Additionally, we provide an anytime estimator that can generate provably accurate estimates even when the set of available node outputs is not decodable. We demonstrate the potential applications of this framework in various contexts, such as large-scale matrix multiplication and black-box optimization. We present the implementation of these methods on a serverless cloud computing system and provide numerical results to demonstrate their scalability in practice, including ImageNet scale computations.