Discerning Generic Event Boundaries in Long-Form Wild Videos
This work addresses video understanding by detecting event boundaries without a predefined taxonomy, but it appears incremental as it builds on existing challenge frameworks.
The paper tackles the problem of detecting generic event boundaries in long-form wild videos using a two-stream inflated 3D convolutions architecture to learn spatio-temporal features, with results analyzed from experiments inspired by the CVPR 2021 Generic Event Boundary Detection Challenge.
Detecting generic, taxonomy-free event boundaries invideos represents a major stride forward towards holisticvideo understanding. In this paper we present a technique forgeneric event boundary detection based on a two stream in-flated 3D convolutions architecture, which can learn spatio-temporal features from videos. Our work is inspired from theGeneric Event Boundary Detection Challenge (part of CVPR2021 Long Form Video Understanding- LOVEU Workshop).Throughout the paper we provide an in-depth analysis ofthe experiments performed along with an interpretation ofthe results obtained.