CoheMark: A Novel Sentence-Level Watermark for Enhanced Text Quality
This addresses the challenge of maintaining semantic integrity and logical fluency in watermarked text for content traceability, though it appears incremental as it builds on existing sentence-level techniques.
The paper tackles the problem of balancing high text quality with robust watermark detection in sentence-level watermarking for large language models, proposing CoheMark, which achieves strong watermark strength with minimal impact on text quality.
Watermarking technology is a method used to trace the usage of content generated by large language models. Sentence-level watermarking aids in preserving the semantic integrity within individual sentences while maintaining greater robustness. However, many existing sentence-level watermarking techniques depend on arbitrary segmentation or generation processes to embed watermarks, which can limit the availability of appropriate sentences. This limitation, in turn, compromises the quality of the generated response. To address the challenge of balancing high text quality with robust watermark detection, we propose CoheMark, an advanced sentence-level watermarking technique that exploits the cohesive relationships between sentences for better logical fluency. The core methodology of CoheMark involves selecting sentences through trained fuzzy c-means clustering and applying specific next sentence selection criteria. Experimental evaluations demonstrate that CoheMark achieves strong watermark strength while exerting minimal impact on text quality.