CLDec 4, 2023

New Evaluation Metrics Capture Quality Degradation due to LLM Watermarking

arXiv:2312.02382v113 citationsh-index: 1Trans. Mach. Learn. Res.
Originality Synthesis-oriented
AI Analysis

This addresses the need for better evaluation metrics in watermarking research, though it is incremental as it focuses on assessment rather than new watermarking algorithms.

The paper tackled the problem of evaluating LLM watermarking by introducing two new methods—LLM-judger evaluation and binary classification on embeddings—and found that current watermarking techniques are easily detectable and degrade text coherence and depth.

With the increasing use of large-language models (LLMs) like ChatGPT, watermarking has emerged as a promising approach for tracing machine-generated content. However, research on LLM watermarking often relies on simple perplexity or diversity-based measures to assess the quality of watermarked text, which can mask important limitations in watermarking. Here we introduce two new easy-to-use methods for evaluating watermarking algorithms for LLMs: 1) evaluation by LLM-judger with specific guidelines; and 2) binary classification on text embeddings to distinguish between watermarked and unwatermarked text. We apply these methods to characterize the effectiveness of current watermarking techniques. Our experiments, conducted across various datasets, reveal that current watermarking methods are detectable by even simple classifiers, challenging the notion of watermarking subtlety. We also found, through the LLM judger, that watermarking impacts text quality, especially in degrading the coherence and depth of the response. Our findings underscore the trade-off between watermark robustness and text quality and highlight the importance of having more informative metrics to assess watermarking quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes