CVAINov 29, 2021

UBoCo : Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection

arXiv:2111.14799v232 citations
AI Analysis

This addresses the need for interpretable and semantically valid video parsing for applications in video understanding, representing a novel approach rather than an incremental extension.

The paper tackles the problem of Generic Event Boundary Detection (GEBD) in videos by proposing a novel framework using Temporal Self-similarity Matrix (TSM) representation, achieving state-of-the-art performance with significant margins in benchmarks, including an unsupervised method outperforming previous supervised models.

Generic Event Boundary Detection (GEBD) is a newly suggested video understanding task that aims to find one level deeper semantic boundaries of events. Bridging the gap between natural human perception and video understanding, it has various potential applications, including interpretable and semantically valid video parsing. Still at an early development stage, existing GEBD solvers are simple extensions of relevant video understanding tasks, disregarding GEBD's distinctive characteristics. In this paper, we propose a novel framework for unsupervised/supervised GEBD, by using the Temporal Self-similarity Matrix (TSM) as the video representation. The new Recursive TSM Parsing (RTP) algorithm exploits local diagonal patterns in TSM to detect boundaries, and it is combined with the Boundary Contrastive (BoCo) loss to train our encoder to generate more informative TSMs. Our framework can be applied to both unsupervised and supervised settings, with both achieving state-of-the-art performance by a huge margin in GEBD benchmark. Especially, our unsupervised method outperforms the previous state-of-the-art "supervised" model, implying its exceptional efficacy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes