Agentic LLMs as Powerful Deanonymizers: Re-identification of Participants in the Anthropic Interviewer Dataset

arXiv:2601.05918v14 citationsh-index: 2

Originality Incremental advance

AI Analysis

This highlights a privacy risk for participants in qualitative datasets, demonstrating that modern LLM agents can easily deanonymize data, which is incremental but practically important for data release practices.

The study tackled the problem of re-identifying participants in the Anthropic Interviewer dataset using agentic LLMs, showing that six out of twenty-four scientist interviews could be linked to specific scientific works and authors, sometimes uniquely identifying individuals.

On December 4, 2025, Anthropic released Anthropic Interviewer, an AI tool for running qualitative interviews at scale, along with a public dataset of 1,250 interviews with professionals, including 125 scientists, about their use of AI for research. Focusing on the scientist subset, I show that widely available LLMs with web search and agentic capabilities can link six out of twenty-four interviews to specific scientific works, recovering associated authors and, in some cases, uniquely identifying the interviewees. My contribution is to show that modern LLM-based agents make such re-identification attacks easy and low-effort: off-the-shelf tools can, with a few natural-language prompts, search the web, cross-reference details, and propose likely matches, effectively lowering the technical barrier. Existing safeguards can be bypassed by breaking down the re-identification into benign tasks. I outline the attack at a high level, discuss implications for releasing rich qualitative data in the age of LLM agents, and propose mitigation recommendations and open problems. I have notified Anthropic of my findings.

View on arXiv PDF

Similar