SimMark: A Robust Sentence-Level Similarity-Based Watermarking Algorithm for Large Language Models
This addresses the need for traceable LLM outputs to combat misuse, offering a practical solution for both open and API-based models, though it is incremental as it builds on existing watermarking techniques.
The paper tackles the problem of detecting LLM-generated text by introducing SimMark, a sentence-level watermarking algorithm that embeds statistical patterns using semantic similarity and rejection sampling, achieving robustness against paraphrasing attacks and surpassing prior methods in robustness, efficiency, and applicability while maintaining text quality.
The widespread adoption of large language models (LLMs) necessitates reliable methods to detect LLM-generated text. We introduce SimMark, a robust sentence-level watermarking algorithm that makes LLMs' outputs traceable without requiring access to model internals, making it compatible with both open and API-based LLMs. By leveraging the similarity of semantic sentence embeddings combined with rejection sampling to embed detectable statistical patterns imperceptible to humans, and employing a soft counting mechanism, SimMark achieves robustness against paraphrasing attacks. Experimental results demonstrate that SimMark sets a new benchmark for robust watermarking of LLM-generated content, surpassing prior sentence-level watermarking techniques in robustness, sampling efficiency, and applicability across diverse domains, all while maintaining the text quality and fluency.