CLAIOct 20, 2025

Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution

arXiv:2510.18019v11 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses the fairness gap in watermarking across diverse languages, though it is an incremental improvement on existing methods.

The paper tackles the problem that existing multilingual watermarking methods for LLM outputs fail to remain robust under translation attacks in medium- and low-resource languages, and introduces STEAM, a back-translation-based detection method that achieves average gains of +0.19 AUC and +40%p TPR@1% on 17 languages.

Multilingual watermarking aims to make large language model (LLM) outputs traceable across languages, yet current methods still fall short. Despite claims of cross-lingual robustness, they are evaluated only on high-resource languages. We show that existing multilingual watermarking methods are not truly multilingual: they fail to remain robust under translation attacks in medium- and low-resource languages. We trace this failure to semantic clustering, which fails when the tokenizer vocabulary contains too few full-word tokens for a given language. To address this, we introduce STEAM, a back-translation-based detection method that restores watermark strength lost through translation. STEAM is compatible with any watermarking method, robust across different tokenizers and languages, non-invasive, and easily extendable to new languages. With average gains of +0.19 AUC and +40%p TPR@1% on 17 languages, STEAM provides a simple and robust path toward fairer watermarking across diverse languages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes