CRAIOct 20, 2025

CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks

arXiv:2510.17687v11 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses a critical security problem for MLLM users by providing a robust defense against underexplored implicit attacks, though it is incremental as it builds on automated data generation to improve detection.

The paper tackles the vulnerability of Multimodal Large Language Models (MLLMs) to joint-modal implicit malicious attacks, where benign text and image inputs combine to express unsafe intent, by proposing CrossGuard, an intent-aware safeguard that significantly outperforms existing defenses in security and utility across various benchmarks and settings.

Multimodal Large Language Models (MLLMs) achieve strong reasoning and perception capabilities but are increasingly vulnerable to jailbreak attacks. While existing work focuses on explicit attacks, where malicious content resides in a single modality, recent studies reveal implicit attacks, in which benign text and image inputs jointly express unsafe intent. Such joint-modal threats are difficult to detect and remain underexplored, largely due to the scarcity of high-quality implicit data. We propose ImpForge, an automated red-teaming pipeline that leverages reinforcement learning with tailored reward modules to generate diverse implicit samples across 14 domains. Building on this dataset, we further develop CrossGuard, an intent-aware safeguard providing robust and comprehensive defense against both explicit and implicit threats. Extensive experiments across safe and unsafe benchmarks, implicit and explicit attacks, and multiple out-of-domain settings demonstrate that CrossGuard significantly outperforms existing defenses, including advanced MLLMs and guardrails, achieving stronger security while maintaining high utility. This offers a balanced and practical solution for enhancing MLLM robustness against real-world multimodal threats.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes