CRMar 12

EmbTracker: Traceable Black-box Watermarking for Federated Language Models

Haodong Zhao, Jinming Hu, Yijie Bai, Tian Dong, Wei Du, Zhuosheng Zhang, Yanjiao Chen, Haojin Zhu, Gongshen Liu

arXiv:2603.12089v118.8h-index: 10

Predicted impact top 17% in CR · last 90 daysOriginality Highly original

AI Analysis

This addresses the problem of individual client traceability in FedLMs for secure collaborative learning, offering a novel solution beyond incremental improvements.

The paper tackles the vulnerability of Federated Language Models (FedLMs) to model leakage by untrustworthy clients, proposing EmbTracker, a server-side black-box watermarking framework that achieves robust traceability with verification rates near 100% and minimal performance impact (within 1-2%).

Federated Language Model (FedLM) allows a collaborative learning without sharing raw data, yet it introduces a critical vulnerability, as every untrustworthy client may leak the received functional model instance. Current watermarking schemes for FedLM often require white-box access and client-side cooperation, providing only group-level proof of ownership rather than individual traceability. We propose EmbTracker, a server-side, traceable black-box watermarking framework specifically designed for FedLMs. EmbTracker achieves black-box verifiability by embedding a backdoor-based watermark detectable through simple API queries. Client-level traceability is realized by injecting unique identity-specific watermarks into the model distributed to each client. In this way, a leaked model can be attributed to a specific culprit, ensuring robustness even against non-cooperative participants. Extensive experiments on various language and vision-language models demonstrate that EmbTracker achieves robust traceability with verification rates near 100\%, high resilience against removal attacks (fine-tuning, pruning, quantization), and negligible impact on primary task performance (typically within 1-2\%).

View on arXiv PDF

Similar