IVCVJun 17, 2025

A large-scale heterogeneous 3D magnetic resonance brain imaging dataset for self-supervised learning

arXiv:2506.14432v16 citationsh-index: 50
Originality Synthesis-oriented
AI Analysis

This dataset addresses a bottleneck for researchers in medical imaging by providing a resource to develop and benchmark self-supervised learning methods at scale, though it is incremental as it aggregates existing data.

The authors tackled the lack of large-scale, heterogeneous brain MRI datasets for self-supervised learning by introducing FOMO60K, a dataset of 60,529 scans from 11,187 subjects aggregated from 16 sources, which includes clinical and research images with anatomical and pathological variability to support method development and benchmarking.

We present FOMO60K, a large-scale, heterogeneous dataset of 60,529 brain Magnetic Resonance Imaging (MRI) scans from 13,900 sessions and 11,187 subjects, aggregated from 16 publicly available sources. The dataset includes both clinical- and research-grade images, multiple MRI sequences, and a wide range of anatomical and pathological variability, including scans with large brain anomalies. Minimal preprocessing was applied to preserve the original image characteristics while reducing barriers to entry for new users. Accompanying code for self-supervised pretraining and finetuning is provided. FOMO60K is intended to support the development and benchmarking of self-supervised learning methods in medical imaging at scale.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes