SD LGApr 5

Measuring Robustness of Speech Recognition from MEG Signals Under Distribution Shift

Sheng-You Chien, Bo-Yi Mao, Yi-Ning Chang, Po-Chih Kuo

arXiv:2604.0412910.5

AI Analysis

This work addresses the problem of improving reliability in non-invasive speech decoding from MEG signals for applications like brain-computer interfaces, but it is incremental as it builds on existing benchmarks and methods.

This study tackled robust phoneme classification from MEG signals under distribution shift, finding that instance normalization was the most influential modification for generalization, with their best model achieving 60.95% F1-macro compared to a 39.53% baseline, while MEGConformer maintained 64.09% F1-macro across splits.

This study investigates robust speech-related decoding from non-invasive MEG signals using the LibriBrain phoneme-classification benchmark from the 2025 PNPL competition. We compare residual convolutional neural networks (CNNs), an STFT-based CNN, and a CNN--Transformer hybrid, while also examining the effects of group averaging, label balancing, repeated grouping, normalization strategies, and data augmentation. Across our in-house implementations, preprocessing and data-configuration choices matter more than additional architectural complexity, among which instance normalization emerges as the most influential modification for generalization. The strongest of our own models, a CNN with group averaging, label balancing, repeated grouping, and instance normalization, achieves 60.95% F1-macro on the test split, compared with 39.53% for the plain CNN baseline. However, most of our models, without instance normalization, show substantial validation-to-test degradation, indicating that distribution shift induced by different normalization statistics is a major obstacle to generalization in our experiments. By contrast, MEGConformer maintains 64.09% F1-macro on both validation and test, and saliency-map analysis is qualitatively consistent with this contrast: weaker models exhibit more concentrated or repetitive phoneme-sensitive patterns across splits, whereas MEGConformer appears more distributed. Overall, the results suggest that improving the reliability of non-invasive phoneme decoding will likely require better handling of normalization-related distribution shift while also addressing the challenge of single-trial decoding.

View on arXiv PDF

Similar