CVNov 26, 2025

AVFakeBench: A Comprehensive Audio-Video Forgery Detection Benchmark for AV-LMMs

arXiv:2511.21251v21 citations
Originality Incremental advance
AI Analysis

This addresses the problem of limited benchmarks for audio-video forgery detection for researchers and practitioners, though it is incremental as it builds on existing forgery detection efforts.

The authors tackled the lack of diverse and complex benchmarks for audio-video forgery detection by introducing AVFakeBench, a comprehensive benchmark with 12K questions across seven forgery types, and evaluated 11 AV-LMMs and 2 detection methods, revealing weaknesses in fine-grained perception and reasoning.

The threat of Audio-Video (AV) forgery is rapidly evolving beyond human-centric deepfakes to include more diverse manipulations across complex natural scenes. However, existing benchmarks are still confined to DeepFake-based forgeries and single-granularity annotations, thus failing to capture the diversity and complexity of real-world forgery scenarios. To address this, we introduce AVFakeBench, the first comprehensive audio-video forgery detection benchmark that spans rich forgery semantics across both human subject and general subject. AVFakeBench comprises 12K carefully curated audio-video questions, covering seven forgery types and four levels of annotations. To ensure high-quality and diverse forgeries, we propose a multi-stage hybrid forgery framework that integrates proprietary models for task planning with expert generative models for precise manipulation. The benchmark establishes a multi-task evaluation framework covering binary judgment, forgery types classification, forgery detail selection, and explanatory reasoning. We evaluate 11 Audio-Video Large Language Models (AV-LMMs) and 2 prevalent detection methods on AVFakeBench, demonstrating the potential of AV-LMMs as emerging forgery detectors while revealing their notable weaknesses in fine-grained perception and reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes