CLOct 17, 2024

A Watermark for Order-Agnostic Language Models

arXiv:2410.13805v113 citationsh-index: 9ICLR
Originality Highly original
AI Analysis

This work addresses the need for watermarking techniques in order-agnostic language models, which is an incremental advancement for applications in fields like protein design and conditional text generation.

The paper tackled the problem of watermarking order-agnostic language models, which cannot use existing sequential methods, by introducing Pattern-mark, a pattern-based framework that demonstrated enhanced detection efficiency, generation quality, and robustness in evaluations on models like ProteinMPNN and CMLM.

Statistical watermarking techniques are well-established for sequentially decoded language models (LMs). However, these techniques cannot be directly applied to order-agnostic LMs, as the tokens in order-agnostic LMs are not generated sequentially. In this work, we introduce Pattern-mark, a pattern-based watermarking framework specifically designed for order-agnostic LMs. We develop a Markov-chain-based watermark generator that produces watermark key sequences with high-frequency key patterns. Correspondingly, we propose a statistical pattern-based detection algorithm that recovers the key sequence during detection and conducts statistical tests based on the count of high-frequency patterns. Our extensive evaluations on order-agnostic LMs, such as ProteinMPNN and CMLM, demonstrate Pattern-mark's enhanced detection efficiency, generation quality, and robustness, positioning it as a superior watermarking technique for order-agnostic LMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes