CVFeb 9

VideoVeritas: AI-Generated Video Detection via Perception Pretext Reinforcement Learning

arXiv:2602.08828v16 citationsh-index: 23
Originality Incremental advance
AI Analysis

This addresses security risks from video generation for online platforms and media verification, but it appears incremental as it builds on existing multi-modal large language models with enhancements.

The paper tackles the problem of detecting AI-generated videos by introducing VideoVeritas, a framework that integrates fine-grained perception and fact-based reasoning, achieving more balanced performance across diverse benchmarks compared to existing methods.

The growing capability of video generation poses escalating security risks, making reliable detection increasingly essential. In this paper, we introduce VideoVeritas, a framework that integrates fine-grained perception and fact-based reasoning. We observe that while current multi-modal large language models (MLLMs) exhibit strong reasoning capacity, their granular perception ability remains limited. To mitigate this, we introduce Joint Preference Alignment and Perception Pretext Reinforcement Learning (PPRL). Specifically, rather than directly optimizing for detection task, we adopt general spatiotemporal grounding and self-supervised object counting in the RL stage, enhancing detection performance with simple perception pretext tasks. To facilitate robust evaluation, we further introduce MintVid, a light yet high-quality dataset containing 3K videos from 9 state-of-the-art generators, along with a real-world collected subset that has factual errors in content. Experimental results demonstrate that existing methods tend to bias towards either superficial reasoning or mechanical analysis, while VideoVeritas achieves more balanced performance across diverse benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes