DCLGNov 16, 2022

Impact of Redundancy on Resilience in Distributed Optimization and Learning

arXiv:2211.08622v23 citationsh-index: 71
Originality Incremental advance
AI Analysis

This addresses robustness in distributed machine learning systems for applications where agents may be unreliable, though it is incremental in extending existing resilience concepts.

The paper tackles the problem of resilient distributed optimization and learning in systems with Byzantine faulty and slow agents, showing that an approximate solution can be achieved with sufficient redundancy in local cost functions, as demonstrated theoretically and empirically with an error bound of O(ε).

This report considers the problem of resilient distributed optimization and stochastic learning in a server-based architecture. The system comprises a server and multiple agents, where each agent has its own local cost function. The agents collaborate with the server to find a minimum of the aggregate of the local cost functions. In the context of stochastic learning, the local cost of an agent is the loss function computed over the data at that agent. In this report, we consider this problem in a system wherein some of the agents may be Byzantine faulty and some of the agents may be slow (also called stragglers). In this setting, we investigate the conditions under which it is possible to obtain an "approximate" solution to the above problem. In particular, we introduce the notion of $(f, r; ε)$-resilience to characterize how well the true solution is approximated in the presence of up to $f$ Byzantine faulty agents, and up to $r$ slow agents (or stragglers) -- smaller $ε$ represents a better approximation. We also introduce a measure named $(f, r; ε)$-redundancy to characterize the redundancy in the cost functions of the agents. Greater redundancy allows for a better approximation when solving the problem of aggregate cost minimization. In this report, we constructively show (both theoretically and empirically) that $(f, r; \mathcal{O}(ε))$-resilience can indeed be achieved in practice, given that the local cost functions are sufficiently redundant.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes