CVCLIVApr 1, 2025

ShieldGemma 2: Robust and Tractable Image Content Moderation

arXiv:2504.01081v222 citationsh-index: 12
Originality Incremental advance
AI Analysis

This provides an open tool for multimodal safety and responsible AI development, though it appears incremental as it builds on existing Gemma 3 architecture.

The authors tackled the problem of image content moderation by introducing ShieldGemma 2, a 4B parameter model that achieved state-of-the-art performance on both synthetic and natural images across key harm categories.

We introduce ShieldGemma 2, a 4B parameter image content moderation model built on Gemma 3. This model provides robust safety risk predictions across the following key harm categories: Sexually Explicit, Violence \& Gore, and Dangerous Content for synthetic images (e.g. output of any image generation model) and natural images (e.g. any image input to a Vision-Language Model). We evaluated on both internal and external benchmarks to demonstrate state-of-the-art performance compared to LlavaGuard \citep{helff2024llavaguard}, GPT-4o mini \citep{hurst2024gpt}, and the base Gemma 3 model \citep{gemma_2025} based on our policies. Additionally, we present a novel adversarial data generation pipeline which enables a controlled, diverse, and robust image generation. ShieldGemma 2 provides an open image moderation tool to advance multimodal safety and responsible AI development.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes