SE AI AR CRNov 3, 2025

LM-Fix: Lightweight Bit-Flip Detection and Rapid Recovery Framework for Language Models

Ahmad Tahmasivand, Noureldin Zahran, Saba Al-Sayouri, Mohammed Fouda, Khaled N. Khasawneh

arXiv:2511.02866v18.02 citationsh-index: 4ICCD

Originality Incremental advance

AI Analysis

This provides a practical, low-overhead solution for keeping LLMs reliable in production, addressing a specific fault-tolerance issue for AI system developers.

The paper tackles the problem of bit-flip faults in large language models by introducing LM-Fix, a lightweight framework that detects over 94% of single-bit flips and nearly 100% of multi-bit flips with 1% to 7.7% runtime overhead, and recovers faults more than 100x faster than reloading.

This paper presents LM-Fix, a lightweight detection and rapid recovery framework for faults in large language models (LLMs). Existing integrity approaches are often heavy or slow for modern LLMs. LM-Fix runs a short test-vector pass and uses hash-guided checks to detect bit-flip faults, then repairs them locally without a full reload. Across multiple models, it detects over 94% of single-bit flips at TVL=200 and nearly 100% of multi-bit flips with approximately 1% to 7.7% runtime overhead; recovery is more than 100x faster than reloading. These results show a practical, low-overhead solution to keep LLMs reliable in production

View on arXiv PDF

Similar