CLOct 17, 2023

Watermarking LLMs with Weight Quantization

arXiv:2310.11237v1139 citationsh-index: 24Has Code
Originality Incremental advance
AI Analysis

This addresses the need for model protection in rapidly deployed LLMs, offering a practical solution for license enforcement, though it is incremental as it builds on existing quantization techniques.

The paper tackles the problem of protecting open-source large language model weights from malicious use by proposing a watermarking strategy that embeds watermarks during weight quantization, which remains hidden in int8 mode but detectable in fp32 mode, successfully demonstrated on models like GPT-Neo and LLaMA.

Abuse of large language models reveals high risks as large language models are being deployed at an astonishing speed. It is important to protect the model weights to avoid malicious usage that violates licenses of open-source large language models. This paper proposes a novel watermarking strategy that plants watermarks in the quantization process of large language models without pre-defined triggers during inference. The watermark works when the model is used in the fp32 mode and remains hidden when the model is quantized to int8, in this way, the users can only inference the model without further supervised fine-tuning of the model. We successfully plant the watermark into open-source large language model weights including GPT-Neo and LLaMA. We hope our proposed method can provide a potential direction for protecting model weights in the era of large language model applications.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes