LLMmap: Fingerprinting For Large Language Models
This addresses a security and privacy problem for users and developers of LLM-integrated applications, representing a novel method for a known bottleneck in model identification.
The paper tackles the problem of identifying specific LLM versions in applications by introducing LLMmap, a fingerprinting technique that uses crafted queries to achieve over 95% accuracy in identifying 42 LLM versions with as few as 8 interactions.
We introduce LLMmap, a first-generation fingerprinting technique targeted at LLM-integrated applications. LLMmap employs an active fingerprinting approach, sending carefully crafted queries to the application and analyzing the responses to identify the specific LLM version in use. Our query selection is informed by domain expertise on how LLMs generate uniquely identifiable responses to thematically varied prompts. With as few as 8 interactions, LLMmap can accurately identify 42 different LLM versions with over 95% accuracy. More importantly, LLMmap is designed to be robust across different application layers, allowing it to identify LLM versions--whether open-source or proprietary--from various vendors, operating under various unknown system prompts, stochastic sampling hyperparameters, and even complex generation frameworks such as RAG or Chain-of-Thought. We discuss potential mitigations and demonstrate that, against resourceful adversaries, effective countermeasures may be challenging or even unrealizable.