CVNov 12, 2025

DBINDS -- Can Initial Noise from Diffusion Model Inversion Help Reveal AI-Generated Videos?

arXiv:2511.09184v11 citationsh-index: 1

Originality Highly original

AI Analysis

This addresses content security and forensic analysis challenges posed by advanced AI-generated videos, offering improved generalization over existing detectors.

The paper tackles the problem of detecting AI-generated videos by proposing DBINDS, a detector that analyzes latent-space dynamics from diffusion model inversion instead of pixel-level cues. It achieves strong cross-generator performance on GenVidBench, demonstrating good generalization and robustness in limited-data settings.

AI-generated video has advanced rapidly and poses serious challenges to content security and forensic analysis. Existing detectors rely mainly on pixel-level visual cues and generalize poorly to unseen generators. We propose DBINDS, a diffusion-model-inversion based detector that analyzes latent-space dynamics rather than pixels. We find that initial noise sequences recovered by diffusion inversion differ systematically between real and generated videos. Building on this, DBINDS forms an Initial Noise Difference Sequence (INDS) and extracts multi-domain, multi-scale features. With feature optimization and a LightGBM classifier tuned by Bayesian search, DBINDS (trained on a single generator) achieves strong cross-generator performance on GenVidBench, demonstrating good generalization and robustness in limited-data settings.

View on arXiv PDF

Similar