LGFeb 5

Accelerating Benchmarking of Functional Connectivity Modeling via Structure-aware Core-set Selection

Ling Zhan, Zhen Li, Junjie Huang, Tao Jia

arXiv:2602.05667v12.71 citationsh-index: 2Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of computationally prohibitive benchmarking for reproducible neuroscience, making large-scale comparisons feasible, though it is incremental as it builds on existing core-set selection methods.

The paper tackles the computational bottleneck in benchmarking functional connectivity modeling methods on large-scale fMRI datasets by proposing a core-set selection method that preserves model performance rankings, achieving a 23.2% improvement in ranking consistency with only 10% of the data.

Benchmarking the hundreds of functional connectivity (FC) modeling methods on large-scale fMRI datasets is critical for reproducible neuroscience. However, the combinatorial explosion of model-data pairings makes exhaustive evaluation computationally prohibitive, preventing such assessments from becoming a routine pre-analysis step. To break this bottleneck, we reframe the challenge of FC benchmarking by selecting a small, representative core-set whose sole purpose is to preserve the relative performance ranking of FC operators. We formalize this as a ranking-preserving subset selection problem and propose Structure-aware Contrastive Learning for Core-set Selection (SCLCS), a self-supervised framework to select these core-sets. SCLCS first uses an adaptive Transformer to learn each sample's unique FC structure. It then introduces a novel Structural Perturbation Score (SPS) to quantify the stability of these learned structures during training, identifying samples that represent foundational connectivity archetypes. Finally, while SCLCS identifies stable samples via a top-k ranking, we further introduce a density-balanced sampling strategy as a necessary correction to promote diversity, ensuring the final core-set is both structurally robust and distributionally representative. On the large-scale REST-meta-MDD dataset, SCLCS preserves the ground-truth model ranking with just 10% of the data, outperforming state-of-the-art (SOTA) core-set selection methods by up to 23.2% in ranking consistency (nDCG@k). To our knowledge, this is the first work to formalize core-set selection for FC operator benchmarking, thereby making large-scale operators comparisons a feasible and integral part of computational neuroscience. Code is publicly available on https://github.com/lzhan94swu/SCLCS

View on arXiv PDF Code

Similar