CVNov 24, 2017

Summarizing First-Person Videos from Third Persons' Points of Views

arXiv:1711.08922v232 citations
AI Analysis

This addresses a domain-specific problem for applications like viewing or searching first-person videos, representing an incremental advancement by adapting methods to a new video type.

The paper tackles the problem of summarizing first-person videos, which existing third-person video summarization methods cannot easily generalize to, by proposing a novel deep neural network architecture that operates in a semi-supervised setting. The model achieves competitive results on benchmarks and collected first-person datasets, though specific numerical gains are not provided.

Video highlight or summarization is among interesting topics in computer vision, which benefits a variety of applications like viewing, searching, or storage. However, most existing studies rely on training data of third-person videos, which cannot easily generalize to highlight the first-person ones. With the goal of deriving an effective model to summarize first-person videos, we propose a novel deep neural network architecture for describing and discriminating vital spatiotemporal information across videos with different points of view. Our proposed model is realized in a semi-supervised setting, in which fully annotated third-person videos, unlabeled first-person videos, and a small number of annotated first-person ones are presented during training. In our experiments, qualitative and quantitative evaluations on both benchmarks and our collected first-person video datasets are presented.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes