CVMar 26, 2018

Unsupervised Learning and Segmentation of Complex Activities from Video

arXiv:1803.09490v1125 citations
Originality Incremental advance
AI Analysis

It addresses video activity segmentation for computer vision applications, representing an incremental improvement over existing methods.

The paper tackles unsupervised segmentation of complex activities in video into sub-activities without textual input, achieving state-of-the-art results on Breakfast Actions and Inria Instructional Videos datasets.

This paper presents a new method for unsupervised segmentation of complex activities from video into multiple steps, or sub-activities, without any textual input. We propose an iterative discriminative-generative approach which alternates between discriminatively learning the appearance of sub-activities from the videos' visual features to sub-activity labels and generatively modelling the temporal structure of sub-activities using a Generalized Mallows Model. In addition, we introduce a model for background to account for frames unrelated to the actual activities. Our approach is validated on the challenging Breakfast Actions and Inria Instructional Videos datasets and outperforms both unsupervised and weakly-supervised state of the art.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes