CVAIJun 18, 2021

Discerning Generic Event Boundaries in Long-Form Wild Videos

arXiv:2106.10090v16 citations
Originality Synthesis-oriented
AI Analysis

This work addresses video understanding by detecting event boundaries without a predefined taxonomy, but it appears incremental as it builds on existing challenge frameworks.

The paper tackles the problem of detecting generic event boundaries in long-form wild videos using a two-stream inflated 3D convolutions architecture to learn spatio-temporal features, with results analyzed from experiments inspired by the CVPR 2021 Generic Event Boundary Detection Challenge.

Detecting generic, taxonomy-free event boundaries invideos represents a major stride forward towards holisticvideo understanding. In this paper we present a technique forgeneric event boundary detection based on a two stream in-flated 3D convolutions architecture, which can learn spatio-temporal features from videos. Our work is inspired from theGeneric Event Boundary Detection Challenge (part of CVPR2021 Long Form Video Understanding- LOVEU Workshop).Throughout the paper we provide an in-depth analysis ofthe experiments performed along with an interpretation ofthe results obtained.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes