Fault Tolerance in Distributed Neural Computing
This addresses reliability issues in distributed computing systems, but it appears incremental as it applies known neural network fault tolerance to broader computational contexts.
The paper tackled the problem of ensuring system reliability in distributed computing by analyzing a distributed feed-forward neural network's fault tolerance during learning and operation, finding it insensitive to intermittent faults from unreliable communication or hardware, with investigation into overhead and sensitivity to limited failures.
With the increasing complexity of computing systems, complete hardware reliability can no longer be guaranteed. We need, however, to ensure overall system reliability. One of the most important features of artificial neural networks is their intrinsic fault-tolerance. The aim of this work is to investigate whether such networks have features that can be applied to wider computational systems. This paper presents an analysis, in both the learning and operational phases, of a distributed feed-forward neural network with decentralised event-driven time management, which is insensitive to intermittent faults caused by unreliable communication or faulty hardware components. The learning rules used in the model are local in space and time, which allows efficient scalable distributed implementation. We investigate the overhead caused by injected faults and analyse the sensitivity to limited failures in the computational hardware in different areas of the network.