CVDec 15, 2025

Grab-3D: Detecting AI-Generated Videos from 3D Geometric Temporal Consistency

arXiv:2512.13665v1h-index: 19
Originality Highly original
AI Analysis

This addresses the need for reliable detection of AI-generated videos, which is crucial for security and media integrity, though it is incremental as it builds on existing detection methods by focusing on 3D geometry.

The paper tackles the problem of detecting AI-generated videos by analyzing 3D geometric temporal consistency, revealing discrepancies in geometric patterns between real and generated videos, and introduces Grab-3D, which significantly outperforms state-of-the-art detectors with robust cross-domain generalization.

Recent advances in diffusion-based generation techniques enable AI models to produce highly realistic videos, heightening the need for reliable detection mechanisms. However, existing detection methods provide only limited exploration of the 3D geometric patterns present in generated videos. In this paper, we use vanishing points as an explicit representation of 3D geometry patterns, revealing fundamental discrepancies in geometric consistency between real and AI-generated videos. We introduce Grab-3D, a geometry-aware transformer framework for detecting AI-generated videos based on 3D geometric temporal consistency. To enable reliable evaluation, we construct an AI-generated video dataset of static scenes, allowing stable 3D geometric feature extraction. We propose a geometry-aware transformer equipped with geometric positional encoding, temporal-geometric attention, and an EMA-based geometric classifier head to explicitly inject 3D geometric awareness into temporal modeling. Experiments demonstrate that Grab-3D significantly outperforms state-of-the-art detectors, achieving robust cross-domain generalization to unseen generators.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes