LGFeb 13, 2025

Mitigating multiple single-event upsets during deep neural network inference using fault-aware training

arXiv:2502.09374v11 citationsh-index: 4J Instrum
Originality Incremental advance
AI Analysis

This addresses reliability issues for DNNs in harsh environments like high radiation, but it is incremental as it builds on existing fault mitigation techniques.

The study tackled the problem of multiple single-bit upsets affecting deep neural networks in safety-critical applications by proposing fault-aware training, which improved fault tolerance by up to a factor of 3 without hardware changes.

Deep neural networks (DNNs) are increasingly used in safety-critical applications. Reliable fault analysis and mitigation are essential to ensure their functionality in harsh environments that contain high radiation levels. This study analyses the impact of multiple single-bit single-event upsets in DNNs by performing fault injection at the level of a DNN model. Additionally, a fault aware training (FAT) methodology is proposed that improves the DNNs' robustness to faults without any modification to the hardware. Experimental results show that the FAT methodology improves the tolerance to faults up to a factor 3.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes