LLM Watermarking Using Mixtures and Statistical-to-Computational Gaps
This addresses the need for reliable detection of AI-generated text, which is crucial for applications like content moderation and academic integrity, though it appears incremental as it builds on existing watermarking approaches.
The paper tackles the problem of detecting text generated by large language models versus human-written text by proposing two watermarking schemes: an undetectable and elementary scheme for the closed setting, and an unremovable scheme for the harder open setting where adversaries have extensive model access.
Given a text, can we determine whether it was generated by a large language model (LLM) or by a human? A widely studied approach to this problem is watermarking. We propose an undetectable and elementary watermarking scheme in the closed setting. Also, in the harder open setting, where the adversary has access to most of the model, we propose an unremovable watermarking scheme.