CLAIApr 14, 2025

You've Changed: Detecting Modification of Black-Box Large Language Models

arXiv:2504.12335v11 citationsh-index: 24
Originality Incremental advance
AI Analysis

This addresses the issue for developers who need to monitor LLM behavior changes without expensive benchmarks, though it is incremental as it builds on existing feature-based methods.

The paper tackles the problem of detecting changes in black-box large language models (LLMs) provided via APIs by comparing distributions of linguistic and psycholinguistic features in generated text, showing that simple features with a statistical test can distinguish between models like OpenAI's and Meta's Llama 3 70B.

Large Language Models (LLMs) are often provided as a service via an API, making it challenging for developers to detect changes in their behavior. We present an approach to monitor LLMs for changes by comparing the distributions of linguistic and psycholinguistic features of generated text. Our method uses a statistical test to determine whether the distributions of features from two samples of text are equivalent, allowing developers to identify when an LLM has changed. We demonstrate the effectiveness of our approach using five OpenAI completion models and Meta's Llama 3 70B chat model. Our results show that simple text features coupled with a statistical test can distinguish between language models. We also explore the use of our approach to detect prompt injection attacks. Our work enables frequent LLM change monitoring and avoids computationally expensive benchmark evaluations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes