QMMEMLJun 23, 2017

Cross-validation failure: small sample sizes lead to large error bars

arXiv:1706.07581v1656 citations
Originality Synthesis-oriented
AI Analysis

This highlights a critical issue for neuroimaging researchers using predictive models, as it reveals that standard error estimates are unreliable, potentially affecting biomarker development and method validation in fields with limited sample availability.

The paper tackles the problem of underestimated error bars in cross-validation for predictive models in neuroimaging, showing that small sample sizes (e.g., 100 samples) lead to large error bars of ±10%, which compromise the reliability of conclusions in studies where acquiring more samples is not feasible.

Predictive models ground many state-of-the-art developments in statistical brain image analysis: decoding, MVPA, searchlight, or extraction of biomarkers. The principled approach to establish their validity and usefulness is cross-validation, testing prediction on unseen data. Here, I would like to raise awareness on error bars of cross-validation, which are often underestimated. Simple experiments show that sample sizes of many neuroimaging studies inherently lead to large error bars, eg $\pm$10% for 100 samples. The standard error across folds strongly underestimates them. These large error bars compromise the reliability of conclusions drawn with predictive models, such as biomarkers or methods developments where, unlike with cognitive neuroimaging MVPA approaches, more samples cannot be acquired by repeating the experiment across many subjects. Solutions to increase sample size must be investigated, tackling possible increases in heterogeneity of the data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes