CR AI CYNov 24, 2025

A Longitudinal Measurement of Privacy Policy Evolution for Large Language Models

Zhen Tao, Shidong Pan, Zhenchang Xing, Emily Black, Talia Gillis, Chunyang Chen

arXiv:2511.21758v18.62 citations

Originality Incremental advance

AI Analysis

This work addresses a gap in understanding privacy policies for LLM services, which is crucial for users and regulators concerned about data practices in AI systems.

The paper presents the first longitudinal study of privacy policies for large language model (LLM) providers, analyzing 74 policies and 115 supplemental documents from 11 providers across 5 countries up to August 2025, finding that these policies are substantially longer, require college-level reading ability, and remain highly vague.

Large language model (LLM) services have been rapidly integrated into people's daily lives as chatbots and agentic systems. They are nourished by collecting rich streams of data, raising privacy concerns around excessive collection of sensitive personal information. Privacy policies are the fundamental mechanism for informing users about data practices in modern information privacy paradigm. Although traditional web and mobile policies are well studied, the privacy policies of LLM providers, their LLM-specific content, and their evolution over time remain largely underexplored. In this paper, we present the first longitudinal empirical study of privacy policies for mainstream LLM providers worldwide. We curate a chronological dataset of 74 historical privacy policies and 115 supplemental privacy documents from 11 LLM providers across 5 countries up to August 2025, and extract over 3,000 sentence-level edits between consecutive policy versions. We compare LLM privacy policies to those of other software formats, propose a taxonomy tailored to LLM privacy policies, annotate policy edits and align them with a timeline of key LLM ecosystem events. Results show they are substantially longer, demand college-level reading ability, and remain highly vague. Our taxonomy analysis reveals patterns in how providers disclose LLM-specific practices and highlights regional disparities in coverage. Policy edits are concentrated in first-party data collection and international/specific-audience sections, and that product releases and regulatory actions are the primary drivers, shedding light on the status quo and the evolution of LLM privacy policies.

View on arXiv PDF

Similar