CVAIAug 16, 2025

Generic Event Boundary Detection via Denoising Diffusion

arXiv:2508.12084v11 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses the problem of subjective event segmentation in videos for computer vision applications, offering a novel generative approach that is incremental over prior deterministic methods.

The paper tackles generic event boundary detection in videos by proposing a diffusion-based generative model, DiffGEBD, which generates diverse plausible boundaries instead of deterministic predictions, achieving strong performance on Kinetics-GEBD and TAPOS benchmarks.

Generic event boundary detection (GEBD) aims to identify natural boundaries in a video, segmenting it into distinct and meaningful chunks. Despite the inherent subjectivity of event boundaries, previous methods have focused on deterministic predictions, overlooking the diversity of plausible solutions. In this paper, we introduce a novel diffusion-based boundary detection model, dubbed DiffGEBD, that tackles the problem of GEBD from a generative perspective. The proposed model encodes relevant changes across adjacent frames via temporal self-similarity and then iteratively decodes random noise into plausible event boundaries being conditioned on the encoded features. Classifier-free guidance allows the degree of diversity to be controlled in denoising diffusion. In addition, we introduce a new evaluation metric to assess the quality of predictions considering both diversity and fidelity. Experiments show that our method achieves strong performance on two standard benchmarks, Kinetics-GEBD and TAPOS, generating diverse and plausible event boundaries.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes