CRCLSep 29, 2025

Fingerprinting LLMs via Prompt Injection

arXiv:2509.25448v21 citationsh-index: 8
Originality Highly original
AI Analysis

This addresses the need for reliable provenance detection in LLMs, which is crucial for security and accountability, but it is incremental as it builds on existing fingerprinting concepts with a novel method.

The paper tackled the problem of detecting whether large language models (LLMs) are derived from others after modifications like post-training or quantization, by proposing LLMPrint, a framework that uses optimized prompt injection to create robust fingerprints, achieving high true positive rates with near-zero false positive rates on 700 variants.

Large language models (LLMs) are often modified after release through post-processing such as post-training or quantization, which makes it challenging to determine whether one model is derived from another. Existing provenance detection methods have two main limitations: (1) they embed signals into the base model before release, which is infeasible for already published models, or (2) they compare outputs across models using hand-crafted or random prompts, which are not robust to post-processing. In this work, we propose LLMPrint, a novel detection framework that constructs fingerprints by exploiting LLMs' inherent vulnerability to prompt injection. Our key insight is that by optimizing fingerprint prompts to enforce consistent token preferences, we can obtain fingerprints that are both unique to the base model and robust to post-processing. We further develop a unified verification procedure that applies to both gray-box and black-box settings, with statistical guarantees. We evaluate LLMPrint on five base models and around 700 post-trained or quantized variants. Our results show that LLMPrint achieves high true positive rates while keeping false positive rates near zero.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes