CVApr 1, 2022

GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval

arXiv:2204.00486v530 citationsh-index: 19
Originality Incremental advance
AI Analysis

This work addresses the need for more robust and human-like video comprehension in computer vision, though it is incremental as it builds on existing event boundary research.

The authors tackled the problem of fine-grained video understanding by introducing Kinetic-GEB+, a dataset with over 170k boundaries and captions for status changes in 12K videos, and proposed three tasks to evaluate methods, achieving significant improvements with a new TPD modeling method.

Cognitive science has shown that humans perceive videos in terms of events separated by the state changes of dominant subjects. State changes trigger new events and are one of the most useful among the large amount of redundant information perceived. However, previous research focuses on the overall understanding of segments without evaluating the fine-grained status changes inside. In this paper, we introduce a new dataset called Kinetic-GEB+. The dataset consists of over 170k boundaries associated with captions describing status changes in the generic events in 12K videos. Upon this new dataset, we propose three tasks supporting the development of a more fine-grained, robust, and human-like understanding of videos through status changes. We evaluate many representative baselines in our dataset, where we also design a new TPD (Temporal-based Pairwise Difference) Modeling method for visual difference and achieve significant performance improvements. Besides, the results show there are still formidable challenges for current methods in the utilization of different granularities, representation of visual difference, and the accurate localization of status changes. Further analysis shows that our dataset can drive developing more powerful methods to understand status changes and thus improve video level comprehension. The dataset including both videos and boundaries is available at https://yuxuan-w.github.io/GEB-plus/

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes