CELGJun 13, 2025

CLEAN-MI: A Scalable and Efficient Pipeline for Constructing High-Quality Neurodata in Motor Imagery Paradigm

arXiv:2506.11830v13 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses data quality issues for researchers developing foundation models in motor imagery BCIs, though it appears incremental as it builds on existing preprocessing methods.

The paper tackles the challenge of constructing large-scale, high-quality datasets for motor imagery-based brain-computer interfaces by proposing CLEAN-MI, a pipeline that integrates filtering and alignment techniques, resulting in consistent improvements in data quality and classification performance across multiple public datasets.

The construction of large-scale, high-quality datasets is a fundamental prerequisite for developing robust and generalizable foundation models in motor imagery (MI)-based brain-computer interfaces (BCIs). However, EEG signals collected from different subjects and devices are often plagued by low signal-to-noise ratio, heterogeneity in electrode configurations, and substantial inter-subject variability, posing significant challenges for effective model training. In this paper, we propose CLEAN-MI, a scalable and systematic data construction pipeline for constructing large-scale, efficient, and accurate neurodata in the MI paradigm. CLEAN-MI integrates frequency band filtering, channel template selection, subject screening, and marginal distribution alignment to systematically filter out irrelevant or low-quality data and standardize multi-source EEG datasets. We demonstrate the effectiveness of CLEAN-MI on multiple public MI datasets, achieving consistent improvements in data quality and classification performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes