CLDec 22, 2025

HATS: High-Accuracy Triple-Set Watermarking for Large Language Models

arXiv:2512.19378v1h-index: 1
Originality Incremental advance
AI Analysis

This addresses the need for reliable watermarking to curb misuse of LLM outputs, though it appears incremental as it builds on existing watermarking methods with a novel partitioning scheme.

The paper tackles the problem of misuse of LLM-generated text by proposing a triple-set watermarking technique that partitions the vocabulary into Green, Yellow, and Red sets during decoding, achieving high detection accuracy with fixed false-positive rates while maintaining text readability.

Misuse of LLM-generated text can be curbed by watermarking techniques that embed implicit signals into the output. We propose a watermark that partitions the vocabulary at each decoding step into three sets (Green/Yellow/Red) with fixed ratios and restricts sampling to the Green and Yellow sets. At detection time, we replay the same partitions, compute Green-enrichment and Red-depletion statistics, convert them to one-sided z-scores, and aggregate their p-values via Fisher's method to decide whether a passage is watermarked. We implement generation, detection, and testing on Llama 2 7B, and evaluate true-positive rate, false-positive rate, and text quality. Results show that the triple-partition scheme achieves high detection accuracy at fixed FPR while preserving readability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes