Prashant Shah

LG
h-index19
7papers
543citations
Novelty24%
AI Score39

7 Papers

LGApr 22, 2022
Federated Learning Enables Big Data for Rare Cancer Boundary Detection

Sarthak Pati, Ujjwal Baid, Brandon Edwards et al.

Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25,256 MRI scans from 6,314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing.

LGFeb 22, 2024Code
Privacy-Enhancing Collaborative Information Sharing through Federated Learning -- A Case of the Insurance Industry

Panyi Dong, Zhiyu Quan, Brandon Edwards et al.

The report demonstrates the benefits (in terms of improved claims loss modeling) of harnessing the value of Federated Learning (FL) to learn a single model across multiple insurance industry datasets without requiring the datasets themselves to be shared from one company to another. The application of FL addresses two of the most pressing concerns: limited data volume and data variety, which are caused by privacy concerns, the rarity of claim events, the lack of informative rating factors, etc.. During each round of FL, collaborators compute improvements on the model using their local private data, and these insights are combined to update a global model. Such aggregation of insights allows for an increase to the effectiveness in forecasting claims losses compared to models individually trained at each collaborator. Critically, this approach enables machine learning collaboration without the need for raw data to leave the compute infrastructure of each respective data owner. Additionally, the open-source framework, OpenFL, that is used in our experiments is designed so that it can be run using confidential computing as well as with additional algorithmic protections against leakage of information via the shared model updates. In such a way, FL is implemented as a privacy-enhancing collaborative learning technique that addresses the challenges posed by the sensitivity and privacy of data in traditional machine learning solutions. This paper's application of FL can also be expanded to other areas including fraud detection, catastrophe modeling, etc., that have a similar need to incorporate data privacy into machine learning collaborations. Our framework and empirical results provide a foundation for future collaborations among insurers, regulators, academic researchers, and InsurTech experts.

CVDec 5, 2025Code
The MICCAI Federated Tumor Segmentation (FeTS) Challenge 2024: Efficient and Robust Aggregation Methods for Federated Learning

Akis Linardos, Sarthak Pati, Ujjwal Baid et al.

We present the design and results of the MICCAI Federated Tumor Segmentation (FeTS) Challenge 2024, which focuses on federated learning (FL) for glioma sub-region segmentation in multi-parametric MRI and evaluates new weight aggregation methods aimed at improving robustness and efficiency. Six participating teams were evaluated using a standardized FL setup and a multi-institutional dataset derived from the BraTS glioma benchmark, consisting of 1,251 training cases, 219 validation cases, and 570 hidden test cases with segmentations for enhancing tumor (ET), tumor core (TC), and whole tumor (WT). Teams were ranked using a cumulative scoring system that considered both segmentation performance, measured by Dice Similarity Coefficient (DSC) and the 95th percentile Hausdorff Distance (HD95), and communication efficiency assessed through the convergence score. A PID-controller-based method achieved the top overall ranking, obtaining mean DSC values of 0.733, 0.761, and 0.751 for ET, TC, and WT, respectively, with corresponding HD95 values of 33.922 mm, 33.623 mm, and 32.309 mm, while also demonstrating the highest communication efficiency with a convergence score of 0.764. These findings advance the state of federated learning for medical imaging, surpassing top-performing methods from previous challenge iterations and highlighting PID controllers as effective mechanisms for stabilizing and optimizing weight aggregation in FL. The challenge code is available at https://github.com/FeTS-AI/Challenge.

LGJan 21, 2025Code
Federated Discrete Denoising Diffusion Model for Molecular Generation with OpenFL

Kevin Ta, Patrick Foley, Mattson Thieme et al.

Generating unique molecules with biochemically desired properties to serve as viable drug candidates is a difficult task that requires specialized domain expertise. In recent years, diffusion models have shown promising results in accelerating the drug design process through AI-driven molecular generation. However, training these models requires massive amounts of data, which are often isolated in proprietary silos. OpenFL is a federated learning framework that enables privacy-preserving collaborative training across these decentralized data sites. In this work, we present a federated discrete denoising diffusion model that was trained using OpenFL. The federated model achieves comparable performance with a model trained on centralized data when evaluating the uniqueness and validity of the generated molecules. This demonstrates the utility of federated learning in the drug design process. OpenFL is available at: https://github.com/securefederatedai/openfl

LGMay 13, 2021Code
OpenFL: An open-source framework for Federated Learning

G Anthony Reina, Alexey Gruzdev, Patrick Foley et al.

Federated learning (FL) is a computational paradigm that enables organizations to collaborate on machine learning (ML) projects without sharing sensitive data, such as, patient records, financial data, or classified secrets. Open Federated Learning (OpenFL https://github.com/intel/openfl) is an open-source framework for training ML algorithms using the data-private collaborative learning paradigm of FL. OpenFL works with training pipelines built with both TensorFlow and PyTorch, and can be easily extended to other ML and deep learning frameworks. Here, we summarize the motivation and development characteristics of OpenFL, with the intention of facilitating its application to existing ML model training in a production environment. Finally, we describe the first use of the OpenFL framework to train consensus ML models in a consortium of international healthcare organizations, as well as how it facilitates the first computational competition on FL.

LGFeb 26, 2021
GaNDLF: A Generally Nuanced Deep Learning Framework for Scalable End-to-End Clinical Workflows in Medical Imaging

Sarthak Pati, Siddhesh P. Thakur, İbrahim Ethem Hamamcı et al.

Deep Learning (DL) has the potential to optimize machine learning in both the scientific and clinical communities. However, greater expertise is required to develop DL algorithms, and the variability of implementations hinders their reproducibility, translation, and deployment. Here we present the community-driven Generally Nuanced Deep Learning Framework (GaNDLF), with the goal of lowering these barriers. GaNDLF makes the mechanism of DL development, training, and inference more stable, reproducible, interpretable, and scalable, without requiring an extensive technical background. GaNDLF aims to provide an end-to-end solution for all DL-related tasks in computational precision medicine. We demonstrate the ability of GaNDLF to analyze both radiology and histology images, with built-in support for k-fold cross-validation, data augmentation, multiple modalities and output classes. Our quantitative performance evaluation on numerous use cases, anatomies, and computational tasks supports GaNDLF as a robust application framework for deployment in clinical workflows.

LGMar 11, 2020
Addressing the Memory Bottleneck in AI Model Training

David Ojika, Bhavesh Patel, G. Anthony Reina et al.

Using medical imaging as case-study, we demonstrate how Intel-optimized TensorFlow on an x86-based server equipped with 2nd Generation Intel Xeon Scalable Processors with large system memory allows for the training of memory-intensive AI/deep-learning models in a scale-up server configuration. We believe our work represents the first training of a deep neural network having large memory footprint (~ 1 TB) on a single-node server. We recommend this configuration to scientists and researchers who wish to develop large, state-of-the-art AI models but are currently limited by memory.