CLAICRSEFeb 19, 2024

Groot: Adversarial Testing for Generative Text-to-Image Models with Tree-based Semantic Transformation

arXiv:2402.12100v115 citationsh-index: 21
Originality Incremental advance
AI Analysis

This addresses safety concerns for users of text-to-image models by providing a more effective adversarial testing method, though it is incremental as it builds on existing testing techniques.

The paper tackles the problem of testing text-to-image generative models for safety vulnerabilities by introducing Groot, an automated framework that uses tree-based semantic transformation to refine adversarial prompts, achieving a 93.66% success rate on models like DALL-E 3 and Midjourney.

With the prevalence of text-to-image generative models, their safety becomes a critical concern. adversarial testing techniques have been developed to probe whether such models can be prompted to produce Not-Safe-For-Work (NSFW) content. However, existing solutions face several challenges, including low success rate and inefficiency. We introduce Groot, the first automated framework leveraging tree-based semantic transformation for adversarial testing of text-to-image models. Groot employs semantic decomposition and sensitive element drowning strategies in conjunction with LLMs to systematically refine adversarial prompts. Our comprehensive evaluation confirms the efficacy of Groot, which not only exceeds the performance of current state-of-the-art approaches but also achieves a remarkable success rate (93.66%) on leading text-to-image models such as DALL-E 3 and Midjourney.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes