CVCLMar 24, 2014

Coherent Multi-Sentence Video Description with Variable Level of Detail

arXiv:1403.6173v1230 citations
Originality Incremental advance
AI Analysis

This work addresses the need for more natural and adaptable video description systems, which is incremental as it builds on existing methods by adding multi-sentence coherence and variable detail.

The paper tackles the problem of generating coherent multi-sentence video descriptions with variable levels of detail, addressing limitations in existing single-sentence and fixed-detail approaches. It achieves this by predicting a semantic representation from video and enforcing topic consistency, with human judges rating the descriptions as more readable, correct, and relevant than related work.

Humans can easily describe what they see in a coherent way and at varying level of detail. However, existing approaches for automatic video description are mainly focused on single sentence generation and produce descriptions at a fixed level of detail. In this paper, we address both of these limitations: for a variable level of detail we produce coherent multi-sentence descriptions of complex videos. We follow a two-step approach where we first learn to predict a semantic representation (SR) from video and then generate natural language descriptions from the SR. To produce consistent multi-sentence descriptions, we model across-sentence consistency at the level of the SR by enforcing a consistent topic. We also contribute both to the visual recognition of objects proposing a hand-centric approach as well as to the robust generation of sentences using a word lattice. Human judges rate our multi-sentence descriptions as more readable, correct, and relevant than related work. To understand the difference between more detailed and shorter descriptions, we collect and analyze a video description corpus of three levels of detail.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes