CV CROct 30, 2024

One Prompt to Verify Your Models: Black-Box Text-to-Image Models Verification via Non-Transferable Adversarial Attacks

Ji Guo, Wenbo Jiang, Rui Zhang, Guoming Lu, Hongwei Li

arXiv:2410.22725v46.54 citationsh-index: 9

Originality Incremental advance

AI Analysis

This addresses security concerns for users of third-party T2I model platforms by ensuring model authenticity, though it is an incremental improvement in verification techniques.

The paper tackles the problem of verifying whether third-party text-to-image (T2I) model services provide the claimed models by proposing VerifyPrompt, a method that uses non-transferable adversarial prompts to distinguish models, achieving over 90% accuracy in experiments.

Recently, various types of Text-to-Image (T2I) models have emerged (such as DALL-E and Stable Diffusion), and showing their advantages in different aspects. Therefore, some third-party service platforms collect different model interfaces and provide cheaper API services and more flexibility in T2I model selections. However, this also raises a new security concern: Are these third-party services truly offering the models they claim? To answer this question, we first define the concept of T2I model verification, which aims to determine whether a black-box target model is identical to a given white-box reference T2I model. After that, we propose VerifyPrompt, which performs T2I model verification through a special designed verify prompt. Intuitionally, the verify prompt is an adversarial prompt for the target model without transferability for other models. It makes the target model generate a specific image while making other models produce entirely different images. Specifically, VerifyPrompt utilizes the Non-dominated Sorting Genetic Algorithm II (NSGA-II) to optimize the cosine similarity of a prompt's text encoding, generating verify prompts. Finally, by computing the CLIP-text similarity scores between the prompts the generated images, VerifyPrompt can determine whether the target model aligns with the reference model. Experimental results demonstrate that VerifyPrompt consistently achieves over 90\% accuracy across various T2I models, confirming its effectiveness in practical model platforms (such as Hugging Face).

View on arXiv PDF

Similar