Same Brain, Different Prediction: How Preprocessing Choices Undermine EEG Decoding Reliability
For researchers using deep learning on EEG data, this work highlights a critical but overlooked source of unreliability and provides practical tools to address it.
EEG predictions are highly unstable under different preprocessing pipelines, with up to 42% of trial-level predictions flipping across six datasets. The authors introduce tools to measure and reduce this instability, including a Walsh-Hadamard decomposition, a per-trial diagnostic (PU), and a regularizer (NA-PGI).
Electroencephalography (EEG) is a cornerstone of brain-computer interfaces and clinical neuroscience, yet deep learning models are typically trained and evaluated under a single, unreported preprocessing pipeline. We formalize preprocessing choices as a counterfactual intervention space and show that EEG predictions are surprisingly unstable under this space: across six datasets spanning four paradigms, up to 42% of trial-level predictions flip when only the preprocessing changes, a variability that standard uncertainty methods do not explicitly quantify because they condition on a fixed preprocessing pipeline. We provide three tools to make this instability measurable, decomposable, and reducible. First, a Walsh-Hadamard decomposition of the 2^7 pipeline space reveals that sensitivity is near-additive in practice under the binary intervention design, enabling efficient step-by-step optimization. Second, we introduce Preprocessing Uncertainty (PU), a per-trial diagnostic that captures a dimension of instability complementary to model-based confidence. Third, we study Normalized Adaptive PGI (NA-PGI), a graph-structured regularizer that exploits the compositional structure of preprocessing interventions as one mitigation strategy with clear scope conditions.