OSARMay 20

Where Linux Breaks Under Radiation: A Cross-Architecture Kernel-Level Characterization of Proton-Induced Failures in COTS SoCs

arXiv:2503.0372242.01 citationsh-index: 5
Predicted impact top 56% in OS · last 90 daysOriginality Incremental advance
AI Analysis

For engineers deploying COTS Linux systems in orbit, this work identifies the specific kernel subsystems where radiation-induced faults originate, enabling targeted mitigations rather than blanket redundancy.

Proton irradiation of three Linux SoCs (40nm ARM, 14nm ARM, 40nm RISC-V) revealed that 133 observed Linux failures originate from specific kernel handlers, with failure profiles varying sharply by node: on 40nm platforms, memory management and drivers account for 67-78% of events, while on 14nm, ~90% funnel through an eMMC storage path. The 14nm SoC shows roughly an order of magnitude lower Linux SEFI cross section.

Linux is increasingly deployed in Low Earth Orbit on commercial off the shelf systems on chip that were not designed for space radiation. Ionizing particles can trigger single event functional interrupts that crash the kernel without warning. Prior work mainly measured board level cross sections, leaving unclear which Linux subsystems fail and how a single upset propagates into an operating system wide failure across architectures, stress conditions, and irradiation conditions. We address this gap by subjecting three Linux platforms to proton irradiation in the 20 to 58 MeV range: a Raspberry Pi Zero 2W with a 40 nm planar ARM Cortex A53, an NXP i MX 8M Plus with a 14 nm FinFET ARM Cortex A53, and an OrangeCrab ECP5 FPGA hosting a VexRiscV RV32I soft core at 40 nm. Through kernel log forensics, we trace all 133 observed Linux failures, most of which have not been previously reported, to their originating kernel handlers. Failure profiles differ sharply across nodes. On the two 40 nm platforms, memory management and driver handlers account for 67 to 78% of events, while on the 14 nm SoC approximately 90% of failures funnel through a single eMMC storage path, comprising 56% filesystem failures and 34% driver failures. This shows that a SEFI susceptible peripheral can strongly dictate system reliability. The 14 nm SoC also shows roughly an order of magnitude lower Linux SEFI cross section, although irradiation geometry and DRAM exposure differences preclude isolating the contribution of process scaling. Reconstructed propagation chains show that faults can cascade through up to six kernel subsystems before terminal failure in severe events. Rather than motivating blanket redundancy, these results identify the kernel subsystem boundaries where radiation induced faults originate, enabling targeted mitigations for hardening COTS Linux systems for orbit.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes