CLMay 26

Vectors Are Not Neutral: Sensitive-Information Inference from Exported LLM Representations in Summarization

Weixin Liu, Bowen Qu, Juming Xiong, Congning Ni, Bradley A. Malin, Zhijun Yin

arXiv:2605.2643384.7

Predicted impact top 53% in CL · last 90 daysOriginality Incremental advance

AI Analysis

For developers deploying LLM summarization systems with downstream vector access, this work highlights the need to audit and mitigate privacy risks on the exact exported artifact.

The paper studies sensitive-information inference from exported LLM representations in clinical summarization, showing that reducing recoverability from one artifact does not reduce it from another. SurfaceLoRA reduces race recoverability from targeted final-token hidden states toward chance while preserving utility, but untargeted pooled artifacts remain vulnerable.

Large language model (LLM) summarization systems may pass compact vector representations of private inputs to downstream retrieval, monitoring, audit, or analytic workflows. Even when source documents remain access-restricted, derived vectors may be handled under different access controls and still support sensitive-information inference, creating a residual information-disclosure risk. We study this issue in clinical discharge-summary generation as a high-stakes case study, using electronic health record (EHR)-recorded race as a controlled sensitive-label audit. We audit two artifacts that a system might retain or expose to downstream components: the final prompt-token hidden state and the mean-pooled prompt representation. Our results show that reducing recoverability of the case-study sensitive label from one exported artifact does not necessarily reduce recoverability from another. As a mitigation case study, we introduce SurfaceLoRA, an exported-vector-targeted parameter-efficient fine-tuning method that uses a gradient-reversal discriminator attached to a designated exported vector. Under a balanced five-way probing protocol, SurfaceLoRA reduces EHR-recorded race recoverability from the targeted final-token artifact toward chance while preserving summarization utility, yet recoverability remains substantially higher from untargeted pooled artifacts. These findings show that privacy auditing and mitigation should be performed on the exact vector artifact retained or exposed to downstream components.

View on arXiv PDF

Similar