CRAILGAug 6, 2025

AuthPrint: Fingerprinting Generative Models Against Malicious Model Providers

arXiv:2508.05691v23 citationsh-index: 1
Originality Highly original
AI Analysis

This addresses the need for provenance attribution in high-stakes domains where model providers might act adversarially, representing a novel threat model in fingerprinting.

The paper tackles the problem of verifying that generative model outputs originate from a certified model, even when the model provider may maliciously substitute it with a cheaper or lower-quality version, achieving near-zero FPR@95%TPR in experiments on GANs and diffusion models.

Generative models are increasingly adopted in high-stakes domains, yet current deployments offer no mechanisms to verify whether a given output truly originates from the certified model. We address this gap by extending model fingerprinting techniques beyond the traditional collaborative setting to one where the model provider itself may act adversarially, replacing the certified model with a cheaper or lower-quality substitute. To our knowledge, this is the first work to study fingerprinting for provenance attribution under such a threat model. Our approach introduces a trusted verifier that, during a certification phase, extracts hidden fingerprints from the authentic model's output space and trains a detector to recognize them. During verification, this detector can determine whether new outputs are consistent with the certified model, without requiring specialized hardware or model modifications. In extensive experiments, our methods achieve near-zero FPR@95%TPR on both GANs and diffusion models, and remain effective even against subtle architectural or training changes. Furthermore, the approach is robust to adaptive adversaries that actively manipulate outputs in an attempt to evade detection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes