Fixing by Mixing: A Recipe for Optimal Byzantine ML under Heterogeneity
This work addresses the problem of ensuring resilience in distributed learning for scenarios with heterogeneous data, which is common in practical settings but previously handled suboptimally.
The paper tackles the challenge of Byzantine machine learning under data heterogeneity, where misbehaving machines are hard to distinguish from non-Byzantine outliers, by introducing nearest neighbor mixing (NNM) to adapt existing solutions, achieving optimal theoretical guarantees and significantly outperforming state-of-the-art methods in empirical results.
Byzantine machine learning (ML) aims to ensure the resilience of distributed learning algorithms to misbehaving (or Byzantine) machines. Although this problem received significant attention, prior works often assume the data held by the machines to be homogeneous, which is seldom true in practical settings. Data heterogeneity makes Byzantine ML considerably more challenging, since a Byzantine machine can hardly be distinguished from a non-Byzantine outlier. A few solutions have been proposed to tackle this issue, but these provide suboptimal probabilistic guarantees and fare poorly in practice. This paper closes the theoretical gap, achieving optimality and inducing good empirical results. In fact, we show how to automatically adapt existing solutions for (homogeneous) Byzantine ML to the heterogeneous setting through a powerful mechanism, we call nearest neighbor mixing (NNM), which boosts any standard robust distributed gradient descent variant to yield optimal Byzantine resilience under heterogeneity. We obtain similar guarantees (in expectation) by plugging NNM in the distributed stochastic heavy ball method, a practical substitute to distributed gradient descent. We obtain empirical results that significantly outperform state-of-the-art Byzantine ML solutions.