LGAIARJan 22, 2024

Zero-Space Cost Fault Tolerance for Transformer-based Language Models on ReRAM

arXiv:2401.11664v18 citationsh-index: 16
Originality Incremental advance
AI Analysis

This addresses fault tolerance for efficient deployment of language models on ReRAM hardware, with incremental improvements in space efficiency.

The paper tackles the problem of hardware failures like stuck-at-fault defects in ReRAM-based Transformer language models, which cause prediction errors, by proposing a zero-space-cost fault protection mechanism that includes differentiable pruning, weight duplication with voting, and MSB embedding, achieving effectiveness as proven on nine GLUE tasks with BERT.

Resistive Random Access Memory (ReRAM) has emerged as a promising platform for deep neural networks (DNNs) due to its support for parallel in-situ matrix-vector multiplication. However, hardware failures, such as stuck-at-fault defects, can result in significant prediction errors during model inference. While additional crossbars can be used to address these failures, they come with storage overhead and are not efficient in terms of space, energy, and cost. In this paper, we propose a fault protection mechanism that incurs zero space cost. Our approach includes: 1) differentiable structure pruning of rows and columns to reduce model redundancy, 2) weight duplication and voting for robust output, and 3) embedding duplicated most significant bits (MSBs) into the model weight. We evaluate our method on nine tasks of the GLUE benchmark with the BERT model, and experimental results prove its effectiveness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes