Robust Regression with Ensembles Communicating over Noisy Channels
This work addresses reliability challenges in distributed machine learning for edge devices, but it is incremental as it builds on existing ensemble methods.
The paper tackles the problem of distributed regression across noisy communication channels by developing methods to optimize aggregation coefficients for noise parameters, demonstrating effectiveness on synthetic and real-world datasets with state-of-the-art ensemble methods like bagging and gradient boosting.
As machine-learning models grow in size, their implementation requirements cannot be met by a single computer system. This observation motivates distributed settings, in which intermediate computations are performed across a network of processing units, while the central node only aggregates their outputs. However, distributing inference tasks across low-precision or faulty edge devices, operating over a network of noisy communication channels, gives rise to serious reliability challenges. We study the problem of an ensemble of devices, implementing regression algorithms, that communicate through additive noisy channels in order to collaboratively perform a joint regression task. We define the problem formally, and develop methods for optimizing the aggregation coefficients for the parameters of the noise in the channels, which can potentially be correlated. Our results apply to the leading state-of-the-art ensemble regression methods: bagging and gradient boosting. We demonstrate the effectiveness of our algorithms on both synthetic and real-world datasets.