LGApr 22, 2022
Federated Learning Enables Big Data for Rare Cancer Boundary DetectionSarthak Pati, Ujjwal Baid, Brandon Edwards et al.
Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25,256 MRI scans from 6,314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing.
IVAug 13, 2022Code
Machine Learning Based Radiomics for Glial Tumor Classification and Comparison with Volumetric AnalysisSevcan Turk, Kaya Oguz, Mehmet Orman et al.
Purpose; The purpose of this study is to classify glial tumors into grade II, III and IV categories noninvasively by application of machine learning to multi-modal MRI features in comparison with volumetric analysis. Methods; We retrospectively studied 57 glioma patients with pre and postcontrast T1 weighted, T2 weighted, FLAIR images, and ADC maps acquired on a 3T MRI. The tumors were segmented into enhancing and nonenhancing portions, tumor necrosis, cyst and edema using semiautomated segmentation of ITK-SNAP open source tool. We measured total tumor volume, enhancing-nonenhancing tumor, edema, necrosis volume and the ratios to the total tumor volume. Training of a support vector machine (SVM) classifier and artificial neural network (ANN) was performed with labeled data designed to answer the question of interest. Specificity, sensitivity, and AUC of the predictions were computed by means of ROC analysis. Differences in continuous measures between groups were assessed by using Kruskall Wallis, with post hoc Dunn correction for multiple comparisons. Results; When we compared the volume ratios between groups, there was statistically significant difference between grade IV and grade II-III glial tumors. Edema and tumor necrosis volume ratios for grade IV glial tumors were higher than that of grade II and III. Volumetric ratio analysis could not distinguish grade II and III tumors successfully. However, SVM and ANN correctly classified each group with accuracies up to 98% and 96%. Conclusion; Application of machine learning methods to MRI features can be used to classify brain tumors noninvasively and more readily in clinical settings.
CVNov 23, 2025Code
Health system learning achieves generalist neuroimaging modelsAkhil Kondepudi, Akshay Rao, Chenhui Zhao et al.
Frontier artificial intelligence (AI) models, such as OpenAI's GPT-5 and Meta's DINOv3, have advanced rapidly through training on internet-scale public data, yet such systems lack access to private clinical data. Neuroimaging, in particular, is underrepresented in the public domain due to identifiable facial features within MRI and CT scans, fundamentally restricting model performance in clinical medicine. Here, we show that frontier models underperform on neuroimaging tasks and that learning directly from uncurated data generated during routine clinical care at health systems, a paradigm we call health system learning, yields high-performance, generalist neuroimaging models. We introduce NeuroVFM, a visual foundation model trained on 5.24 million clinical MRI and CT volumes using a scalable volumetric joint-embedding predictive architecture. NeuroVFM learns comprehensive representations of brain anatomy and pathology, achieving state-of-the-art performance across multiple clinical tasks, including radiologic diagnosis and report generation. The model exhibits emergent neuroanatomic understanding and interpretable visual grounding of diagnostic findings. When paired with open-source language models through lightweight visual instruction tuning, NeuroVFM generates radiology reports that surpass frontier models in accuracy, clinical triage, and expert preference. Through clinically grounded visual understanding, NeuroVFM reduces hallucinated findings and critical errors, offering safer clinical decision support. These results establish health system learning as a paradigm for building generalist medical AI and provide a scalable framework for clinical foundation models.
CVSep 23, 2025
Learning neuroimaging models from health system-scale dataYiwei Lyu, Samir Harake, Asadur Chowdury et al.
Neuroimaging is a ubiquitous tool for evaluating patients with neurological diseases. The global demand for magnetic resonance imaging (MRI) studies has risen steadily, placing significant strain on health systems, prolonging turnaround times, and intensifying physician burnout \cite{Chen2017-bt, Rula2024-qp-1}. These challenges disproportionately impact patients in low-resource and rural settings. Here, we utilized a large academic health system as a data engine to develop Prima, the first vision language model (VLM) serving as an AI foundation for neuroimaging that supports real-world, clinical MRI studies as input. Trained on over 220,000 MRI studies, Prima uses a hierarchical vision architecture that provides general and transferable MRI features. Prima was tested in a 1-year health system-wide study that included 30K MRI studies. Across 52 radiologic diagnoses from the major neurologic disorders, including neoplastic, inflammatory, infectious, and developmental lesions, Prima achieved a mean diagnostic area under the ROC curve of 92.0, outperforming other state-of-the-art general and medical AI models. Prima offers explainable differential diagnoses, worklist priority for radiologists, and clinical referral recommendations across diverse patient demographics and MRI systems. Prima demonstrates algorithmic fairness across sensitive groups and can help mitigate health system biases, such as prolonged turnaround times for low-resource populations. These findings highlight the transformative potential of health system-scale VLMs and Prima's role in advancing AI-driven healthcare.
SIFeb 3, 2019
High-resolution home location prediction from tweets using deep learning with dynamic structureMeysam Ghaffari, Ashok Srinivasan, Xiuwen Liu
Timely and high-resolution estimates of the home locations of a sufficiently large subset of the population are critical for effective disaster response and public health intervention, but this is still an open problem. Conventional data sources, such as census and surveys, have a substantial time lag and cannot capture seasonal trends. Recently, social media data has been exploited to address this problem by leveraging its large user-base and real-time nature. However, inherent sparsity and noise, along with large estimation uncertainty in home locations, have limited their effectiveness. Consequently, much of previous research has aimed only at a coarse spatial resolution, with accuracy being limited for high-resolution methods. In this paper, we develop a deep-learning solution that uses a two-phase dynamic structure to deal with sparse and noisy social media data. In the first phase, high recall is achieved using a random forest, producing more balanced home location candidates. Then two deep neural networks are used to detect home locations with high accuracy. We obtained over 90% accuracy for large subsets on a commonly used dataset. Compared to other high-resolution methods, our approach yields up to 60% error reduction by reducing high-resolution home prediction error from over 21% to less than 8%. Systematic comparisons show that our method gives the highest accuracy both for the entire sample and for subsets. Evaluation on a real-world public health problem further validates the effectiveness of our approach.