SP LGMar 18

FEMBA on the Edge: Physiologically-Aware Pre-Training, Quantization, and Deployment of a Bidirectional Mamba EEG Foundation Model on an Ultra-low Power Microcontroller

Anna Tegon, Nicholas Lehmann, Yawei Li, Andrea Cossettini, Luca Benini, Thorir Mar Ingolfsson

arXiv:2603.2671648.4h-index: 12

Predicted impact top 13% in SP · last 90 daysOriginality Highly original

AI Analysis

This work establishes the first full-stack framework for deploying large-scale EEG foundation models on ultra-low-power wearables, facilitating continuous monitoring for epilepsy and sleep disorders.

The researchers tackled the computational bottleneck of Transformer-based EEG foundation models for wearable neuro-monitoring by developing FEMBA, a bidirectional Mamba architecture with physiologically-aware pre-training and quantization-aware training, achieving a 74% memory reduction to 2MB, real-time inference in 1.70s per 5s window, and improved AUROC from 0.863 to 0.893 on TUAB.

Objective: To enable continuous, long-term neuro-monitoring on wearable devices by overcoming the computational bottlenecks of Transformer-based Electroencephalography (EEG) foundation models and the quantization challenges inherent to State-Space Models (SSMs). Methods: We present FEMBA, a bidirectional Mamba architecture pre-trained on over 21,000 hours of EEG. We introduce a novel Physiologically-Aware pre-training objective, consisting of a reconstruction with low-pass filtering, to prioritize neural oscillations over high-frequency artifacts. To address the activation outliers common in SSMs, we employ Quantization-Aware Training (QAT) to compress the model to 2-bit weights. The framework is deployed on a parallel ultra-low-power RISC-V microcontroller (GAP9) using a custom double-buffered memory streaming scheme. Results: The proposed low-pass pre-training improves downstream AUROC on TUAB from 0.863 to 0.893 and AUPR from 0.862 to 0.898 compared to the best contrastive baseline. QAT successfully compresses weights with negligible performance loss, whereas standard post-training quantization degrades accuracy by approximately \textbf{30\%}. The embedded implementation achieves deterministic real-time inference (\textbf{1.70~s} per 5~s window) and reduces the memory footprint by \textbf{74\%} (to $\approx$2~MB), achieving competitive accuracy with up to \textbf{27$\times$} fewer FLOPs than Transformer benchmarks. Conclusion: FEMBA demonstrates that Mamba-based foundation models can be effectively quantized and deployed on extreme-edge hardware without sacrificing the representation quality required for robust clinical analysis. Significance: This work establishes the first full-stack framework for deploying large-scale EEG foundation models on ultra-low-power wearables, facilitating continuous, SSM based monitoring for epilepsy and sleep disorders.

View on arXiv PDF

Similar