CVJul 27, 2015

Discovery of Shared Semantic Spaces for Multi-Scene Video Query and Summarization

arXiv:1507.07458v133 citations
Originality Incremental advance
AI Analysis

This addresses the need for more efficient and scalable surveillance systems by reducing manual effort through shared semantic analysis across multiple scenes, though it is incremental in extending existing single-scene methods.

The paper tackles the problem of automated video surveillance analysis by developing a framework that clusters scenes based on shared behaviors and identifies which activities are common versus scene-specific, improving tasks like cross-scene query and behavior classification compared to single-scene models.

The growing rate of public space CCTV installations has generated a need for automated methods for exploiting video surveillance data including scene understanding, query, behaviour annotation and summarization. For this reason, extensive research has been performed on surveillance scene understanding and analysis. However, most studies have considered single scenes, or groups of adjacent scenes. The semantic similarity between different but related scenes (e.g., many different traffic scenes of similar layout) is not generally exploited to improve any automated surveillance tasks and reduce manual effort. Exploiting commonality, and sharing any supervised annotations, between different scenes is however challenging due to: Some scenes are totally un-related -- and thus any information sharing between them would be detrimental; while others may only share a subset of common activities -- and thus information sharing is only useful if it is selective. Moreover, semantically similar activities which should be modelled together and shared across scenes may have quite different pixel-level appearance in each scene. To address these issues we develop a new framework for distributed multiple-scene global understanding that clusters surveillance scenes by their ability to explain each other's behaviours; and further discovers which subset of activities are shared versus scene-specific within each cluster. We show how to use this structured representation of multiple scenes to improve common surveillance tasks including scene activity understanding, cross-scene query-by-example, behaviour classification with reduced supervised labelling requirements, and video summarization. In each case we demonstrate how our multi-scene model improves on a collection of standard single scene models and a flat model of all scenes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes