Gaurang Sharma

LG
h-index31
3papers
1citation
Novelty35%
AI Score34

3 Papers

CVFeb 10
Simple Image Processing and Similarity Measures Can Link Data Samples across Databases through Brain MRI

Gaurang Sharma, Harri Polonen, Juha Pajula et al.

Head Magnetic Resonance Imaging (MRI) is routinely collected and shared for research under strict regulatory frameworks. These frameworks require removing potential identifiers before sharing. But, even after skull stripping, the brain parenchyma contains unique signatures that can match other MRIs from the same participants across databases, posing a privacy risk if additional data features are available. Current regulatory frameworks often mandate evaluating such risks based on the assessment of a certain level of reasonableness. Prior studies have already suggested that a brain MRI could enable participant linkage, but they have relied on training-based or computationally intensive methods. Here, we demonstrate that linking an individual's skull-stripped T1-weighted MRI, which may lead to re-identification if other identifiers are available, is possible using standard preprocessing followed by image similarity computation. Nearly perfect linkage accuracy was achieved in matching data samples across various time intervals, scanner types, spatial resolutions, and acquisition protocols, despite potential cognitive decline, simulating MRI matching across databases. These results aim to contribute meaningfully to the development of thoughtful, forward-looking policies in medical data sharing.

1.1LGApr 30
Privacy-Preserving Federated Learning via Differential Privacy and Homomorphic Encryption for Cardiovascular Disease Risk Modeling

Gaurang Sharma, Juha Pajula, Aada Illikainen et al.

Protecting sensitive health data while enabling collaborative analysis is a central challenge in healthcare. Traditional machine learning (ML) requires institutions to pool anonymized patient records, centralizing analytical development and privacy risks at a single site. Privacy-enhancing technologies (PETs), including Differential Privacy (DP) and Homomorphic Encryption (HE), can mitigate these risks. However, they are mainly studied in conventional data-sharing settings and often introduce trade-offs, including reduced model utility, higher computational cost, and increased implementation complexity. Federated Learning (FL) reduces data centralization by enabling institutions to train models locally and share only model updates. Nevertheless, FL does not eliminate privacy risks, as shared parameters or gradients may still reveal sensitive information. Integrating DP or HE into FL can strengthen privacy guarantees, yet their comparative performance and deployment implications in real-world healthcare settings remain unclear. We systematically evaluated DP and HE integration in FL under real-world conditions, comparing them with standard FL and centralized ML (cML) to quantify privacy-utility trade-offs in multi-institutional settings. Using nationwide Swedish healthcare data, we evaluated cardiovascular disease risk prediction using logistic regression (LR) and neural network (NN) learners. FL with HE achieved performance comparable to cML but introduced measurable cryptographic overhead, particularly in the NN implementation. FL with DP incurred lower computational cost; however, LR was more sensitive to calibrated noise than the NN, resulting in greater performance degradation. Our findings provide practical guidance for deploying privacy-preserving FL in fragmented healthcare systems.

LGMar 5, 2025
Federated Learning for Predicting Mild Cognitive Impairment to Dementia Conversion

Gaurang Sharma, Elaheh Moradi, Juha Pajula et al.

Dementia is a progressive condition that impairs an individual's cognitive health and daily functioning, with mild cognitive impairment (MCI) often serving as its precursor. The prediction of MCI to dementia conversion has been well studied, but previous studies have almost always focused on traditional Machine Learning (ML) based methods that require sharing sensitive clinical information to train predictive models. This study proposes a privacy-enhancing solution using Federated Learning (FL) to train predictive models for MCI to dementia conversion without sharing sensitive data, leveraging socio demographic and cognitive measures. We simulated and compared two network architectures, Peer to Peer (P2P) and client-server, to enable collaborative learning. Our results demonstrated that FL had comparable predictive performance to centralized ML, and each clinical site showed similar performance without sharing local data. Moreover, the predictive performance of FL models was superior to site specific models trained without collaboration. This work highlights that FL can eliminate the need for data sharing without compromising model efficacy.