CVJul 3, 2022

Exploiting Context Information for Generic Event Boundary Captioning

Jinrui Zhang, Teng Wang, Feng Zheng, Ran Cheng, Ping Luo

arXiv:2207.01050v14.86 citationsh-index: 29Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of generating descriptive captions for event boundaries in videos, which is incremental as it builds on prior methods by incorporating context information.

The paper tackled the problem of Generic Event Boundary Captioning (GEBC) by designing a model that uses the whole video as input to generate captions for all boundaries in parallel, leveraging context information through boundary-boundary interactions, achieving a score of 72.84 on the test set and second place in a challenge.

Generic Event Boundary Captioning (GEBC) aims to generate three sentences describing the status change for a given time boundary. Previous methods only process the information of a single boundary at a time, which lacks utilization of video context information. To tackle this issue, we design a model that directly takes the whole video as input and generates captions for all boundaries parallelly. The model could learn the context information for each time boundary by modeling the boundary-boundary interactions. Experiments demonstrate the effectiveness of context information. The proposed method achieved a 72.84 score on the test set, and we reached the $2^{nd}$ place in this challenge. Our code is available at: \url{https://github.com/zjr2000/Context-GEBC}

View on arXiv PDF Code

Similar