CLFeb 17, 2025

Idiosyncrasies in Large Language Models

arXiv:2502.12150v228 citationsh-index: 12Has CodeICML
Originality Synthesis-oriented
AI Analysis

This work addresses the issue of model fingerprinting and synthetic data evaluation for researchers and practitioners in AI, though it is incremental as it applies existing fine-tuning methods to a new task.

The study tackled the problem of identifying unique output patterns in Large Language Models (LLMs) by training classifiers to predict the source model from generated text, achieving 97.1% accuracy in a five-way classification task.

In this work, we unveil and study idiosyncrasies in Large Language Models (LLMs) -- unique patterns in their outputs that can be used to distinguish the models. To do so, we consider a simple classification task: given a particular text output, the objective is to predict the source LLM that generates the text. We evaluate this synthetic task across various groups of LLMs and find that simply fine-tuning text embedding models on LLM-generated texts yields excellent classification accuracy. Notably, we achieve 97.1% accuracy on held-out validation data in the five-way classification problem involving ChatGPT, Claude, Grok, Gemini, and DeepSeek. Our further investigation reveals that these idiosyncrasies are rooted in word-level distributions. These patterns persist even when the texts are rewritten, translated, or summarized by an external LLM, suggesting that they are also encoded in the semantic content. Additionally, we leverage LLM as judges to generate detailed, open-ended descriptions of each model's idiosyncrasies. Finally, we discuss the broader implications of our findings, including training on synthetic data, inferring model similarity, and robust evaluation of LLMs. Code is available at https://github.com/locuslab/llm-idiosyncrasies.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes