A Low-cost Fault Corrector for Deep Neural Networks through Range Restriction
This addresses reliability concerns for DNNs in safety-critical applications, but it is an incremental improvement focused on fault tolerance.
The paper tackles the problem of hardware transient faults causing failures in deep neural networks (DNNs) used in safety-critical domains by proposing Ranger, a low-cost fault corrector that rectifies faulty outputs without re-computation, resulting in a 3x to 50x increase in error resilience with no accuracy loss and negligible overheads.
The adoption of deep neural networks (DNNs) in safety-critical domains has engendered serious reliability concerns. A prominent example is hardware transient faults that are growing in frequency due to the progressive technology scaling, and can lead to failures in DNNs. This work proposes Ranger, a low-cost fault corrector, which directly rectifies the faulty output due to transient faults without re-computation. DNNs are inherently resilient to benign faults (which will not cause output corruption), but not to critical faults (which can result in erroneous output). Ranger is an automated transformation to selectively restrict the value ranges in DNNs, which reduces the large deviations caused by critical faults and transforms them to benign faults that can be tolerated by the inherent resilience of the DNNs. Our evaluation on 8 DNNs demonstrates Ranger significantly increases the error resilience of the DNNs (by 3x to 50x), with no loss in accuracy, and with negligible overheads.