Stochastic subspace correction methods and fault tolerance
For researchers in numerical linear algebra and PDE solvers, it provides theoretical guarantees for stochastic solvers that can tolerate node failures in distributed computing.
The paper proves convergence in expectation for stochastic subspace correction methods and their accelerated variants for symmetric positive-definite problems, and demonstrates their potential for fault tolerance in unreliable compute networks using overlapping domain decomposition for PDEs.
We present convergence results in expectation for stochastic subspace correction schemes and their accelerated versions to solve symmetric positive-definite variational problems, and discuss their potential for achieving fault tolerance in an unreliable compute network. We employ the standard overlapping domain decomposition algorithm for PDE discretizations to discuss the latter aspect.