CRAISep 26, 2024

Multi-Designated Detector Watermarking for Language Models

arXiv:2409.17518v23 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses the need for secure and flexible ownership assertion in LLM outputs, particularly for model providers, but it is incremental as it builds on existing watermarking and signature techniques.

The paper tackles the problem of watermarking large language model outputs to allow only specific designated detectors to identify watermarks without degrading quality for ordinary users, and it presents a framework using multi-designated verifier signatures with satisfactory performance metrics.

In this paper, we initiate the study of \emph{multi-designated detector watermarking (MDDW)} for large language models (LLMs). This technique allows model providers to generate watermarked outputs from LLMs with two key properties: (i) only specific, possibly multiple, designated detectors can identify the watermarks, and (ii) there is no perceptible degradation in the output quality for ordinary users. We formalize the security definitions for MDDW and present a framework for constructing MDDW for any LLM using multi-designated verifier signatures (MDVS). Recognizing the significant economic value of LLM outputs, we introduce claimability as an optional security feature for MDDW, enabling model providers to assert ownership of LLM outputs within designated-detector settings. To support claimable MDDW, we propose a generic transformation converting any MDVS to a claimable MDVS. Our implementation of the MDDW scheme highlights its advanced functionalities and flexibility over existing methods, with satisfactory performance metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes