CRAIMar 15

Membership Inference for Contrastive Pre-training Models with Text-only PII Queries

arXiv:2603.1422272.7h-index: 7
Predicted impact top 18% in CR · last 90 daysOriginality Incremental advance
AI Analysis

This addresses privacy concerns for users of vision-language and audio-language systems by enabling efficient auditing without exposing sensitive biometric data, though it is incremental as it builds on existing membership inference methods.

The paper tackles the problem of auditing contrastive pre-training models like CLIP and CLAP for memorizing Personally Identifiable Information (PII) by proposing a text-only membership inference framework called UMID, which significantly improves detection performance and efficiency with sub-second auditing costs.

Contrastive pretraining models such as CLIP and CLAP underpin many vision-language and audio-language systems, yet their reliance on web-scale data raises growing concerns about memorizing Personally Identifiable Information (PII). Auditing such models via membership inference is challenging in practice: shadow-model MIAs are computationally prohibitive for large multimodal backbones, and existing multimodal attacks typically require querying the target with paired biometric inputs, thereby directly exposing sensitive biometric information to the target model. We propose Unimodal Membership Inference Detector (UMID), a text-only auditing framework that performs text-guided cross-modal latent inversion and extracts two complementary signals, similarity (alignment to the queried text) and variability (consistency across randomized inversions). UMID compares these statistics to a lightweight non-member reference constructed from synthetic gibberish and makes decisions via an ensemble of unsupervised anomaly detectors. Comprehensive experiments across diverse CLIP and CLAP architectures demonstrate that UMID significantly improves the effectiveness and efficiency over prior MIAs, delivering strong detection performance with sub-second auditing cost while complying with realistic privacy constraints.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes