SAMark: A Self-Anchored Text Watermarking with Paragraph-Level Paraphrase Robustness

Jiahao Huo, Wenjie Qu, Yibo Yan, Kening Zheng, Jiaheng Zhang, Xuming Hu, Philip S. Yu, Mingxun Zhou

arXiv:2605.2579689.1

AI Analysis

For text watermarking, SAMark solves the problem of robustness to paragraph-level paraphrasing, which prior methods could not handle, by removing dependency on sentence order.

SAMark introduces a self-anchored watermarking framework that achieves up to 90.2% TP@FP1% under paragraph-level paraphrasing attacks, outperforming prior baselines by over 30% on average while maintaining generation quality competitive with unwatermarked text.

Semantic-level watermarking (SWM) improves robustness against text modifications by treating sentences as the basic unit. However, robustness to paragraph-level paraphrasing remains difficult because such attacks globally disrupt watermark signals by changing sentence order. In this work, we propose SAMark, a self-anchored watermarking framework that removes the dependency on sentence order by establishing a step-independent green region in semantic space. To improve detectability, we introduce a multi-channel hyperbolic scoring mechanism that amplifies watermark signals while suppressing noise from weakly aligned candidates. We further propose a diversity-aware filtering strategy that combines hard filtering with soft regularization, extending beyond simple n-gram repetition filters to address semantic redundancy. Experimental results show that SAMark achieves up to 90.2% TP@FP1% under typical paragraph-level paraphrasing attacks, outperforming the strongest prior baseline by more than 30% on average, while maintaining generation quality competitive with unwatermarked text and breaking the robustness-quality trade-off that limits prior methods.

View on arXiv PDF

Similar