ARMay 6

Not All Faults Are Equal: Transient-Fault Sensitivity Characterization of an Open-Source RISC-V Vector Cluster

arXiv:2605.0480333.9h-index: 16Has Code
AI Analysis

For designers of reliable RISC-V vector processors, this work provides quantitative sensitivity data to guide selective protection of datapaths, though it is an incremental characterization study.

This paper characterizes transient-fault sensitivity of an open-source RISC-V vector cluster under SET and SEU fault models, finding that faulty data corruption dominates (≥86% for SET, ≥91% for SEU) and that FP8 arithmetic shows the lowest output impact, while exponent-targeted corruptions cause the most severe SDC events.

We present a transient-fault sensitivity study of the open-source RISC-V vector cluster Spatz under SET and SEU fault models. Across 100,000 fault injections on six MatMul and Widening MatMul configurations, faulty data corruption (FD) is the dominant manifesting outcome for all evaluated workloads, accounting for at least 86% of manifesting errors in the SET campaigns and at least 91% in the SEU campaigns. At the module level, SET sensitivity is concentrated in the vector execution path, while TCDM is the major contributor to FD manifestations. We further quantify SDC severity across FP32, FP16, BP16, and FP8 by analyzing both the average number of corrupted outputs and their RMSE. FP8 shows the lowest output impact overall, while FP16 Widening MatMul reduces both corruption spread and RMSE compared with FP16 MatMul. By contrast, the effect of widening on FP8 is limited in our experiments. Finally, exponent-targeted corruptions induce the most severe SDC events, with the largest deviations observed in FP32 and BP16, motivating selective protection of the highest-impact datapaths and fault cases.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes