LGOct 14, 2025

An Investigation of Memorization Risk in Healthcare Foundation Models

arXiv:2510.12950v12 citationsh-index: 7Has Code
Originality Incremental advance
AI Analysis

This work addresses privacy concerns for patients, particularly vulnerable subgroups, in healthcare AI applications, though it is incremental as it focuses on evaluation rather than novel mitigation methods.

The paper tackled the problem of privacy risks from memorization in healthcare foundation models by introducing a black-box evaluation framework, and validated it on a publicly available EHR model, releasing an open-source toolkit for reproducible assessments.

Foundation models trained on large-scale de-identified electronic health records (EHRs) hold promise for clinical applications. However, their capacity to memorize patient information raises important privacy concerns. In this work, we introduce a suite of black-box evaluation tests to assess privacy-related memorization risks in foundation models trained on structured EHR data. Our framework includes methods for probing memorization at both the embedding and generative levels, and aims to distinguish between model generalization and harmful memorization in clinically relevant settings. We contextualize memorization in terms of its potential to compromise patient privacy, particularly for vulnerable subgroups. We validate our approach on a publicly available EHR foundation model and release an open-source toolkit to facilitate reproducible and collaborative privacy assessments in healthcare AI.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes