ARMay 6

RangeGuard: Efficient, Bounded Approximate Error Correction for Reliable DNNs

arXiv:2605.0456314.4h-index: 1
Predicted impact top 81% in AR · last 90 daysOriginality Highly original
AI Analysis

For DNN and LLM inference under high memory error rates, RangeGuard provides strong reliability with minimal redundancy, addressing the growing vulnerability of large models to multi-bit errors.

RangeGuard introduces a metadata-centric error correction framework that uses compact Range Identifiers to bound error magnitudes in DNNs, tolerating 64+ flipped bits with only 16 bits of parity without noticeable accuracy loss.

As DRAM scales in density and adopts 3D integration, raw fault rates increase and multi-bit errors are no longer rare. Such errors can severely impact Deep Neural Networks (DNNs): although DNNs tolerate small numerical perturbations, random bit flips can create extreme outliers that propagate and sharply degrade accuracy. Large Language Models (LLMs) are particularly vulnerable because attention, residual, and normalization layers can amplify and preserve a single corrupted activation across many layers, destabilizing inference. This paper introduces RangeGuard, a metadata-centric error-correcting framework that provides strong reliability and high efficiency based on bounded approximate correction. Instead of protecting raw bits, RangeGuard encodes compact Range Identifiers (RIDs) that capture the numerical range of each value. These compact metadata enable efficient use of limited redundancy and concentrate protection on range changes, which indicate harmful semantic deviations, while ignoring benign intra-range variations. Upon detecting a range change, RangeGuard restores the correct range and substitutes a representative value, ensuring that error magnitudes are bounded within the range. Based on RIDs, RangeGuard can tolerate 64+ flipped bits using only 16 bits of parity available in GPU memories without a noticeable accuracy loss. By introducing semantic range protection, RangeGuard enables reliable DNN execution even under frequent memory errors and tight redundancy budgets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes