CVCLApr 19, 2025

Towards Explainable Fake Image Detection with Multi-Modal Large Language Models

arXiv:2504.14245v29 citationsh-index: 8Has CodeMM
Originality Incremental advance
AI Analysis

This addresses public security concerns by making fake image detection more explainable, though it appears incremental as it builds on existing MLLM capabilities.

The paper tackles the problem of fake image detection by proposing a framework that uses Multi-modal Large Language Models (MLLMs) to improve generalization and transparency, achieving results that highlight strengths and limitations compared to traditional methods and human evaluators.

Progress in image generation raises significant public security concerns. We argue that fake image detection should not operate as a "black box". Instead, an ideal approach must ensure both strong generalization and transparency. Recent progress in Multi-modal Large Language Models (MLLMs) offers new opportunities for reasoning-based AI-generated image detection. In this work, we evaluate the capabilities of MLLMs in comparison to traditional detection methods and human evaluators, highlighting their strengths and limitations. Furthermore, we design six distinct prompts and propose a framework that integrates these prompts to develop a more robust, explainable, and reasoning-driven detection system. The code is available at https://github.com/Gennadiyev/mllm-defake.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes