CVApr 25, 2016

Long-Term Identity-Aware Multi-Person Tracking for Surveillance Video Summarization

arXiv:1604.07468v215 citations
Originality Incremental advance
AI Analysis

It addresses the problem of summarizing vast surveillance video data for security or monitoring applications, but is incremental as it builds on existing tracking methods with new long-term adaptations.

The paper tackles long-term multi-person tracking in month-long, multi-camera surveillance videos by using face recognition and appearance-spatial manifolds to handle appearance changes and re-entries, achieving 53.2% recall and 69.8% precision on a 23-day dataset, and demonstrates video summarization to generate visual diaries.

Multi-person tracking plays a critical role in the analysis of surveillance video. However, most existing work focus on shorter-term (e.g. minute-long or hour-long) video sequences. Therefore, we propose a multi-person tracking algorithm for very long-term (e.g. month-long) multi-camera surveillance scenarios. Long-term tracking is challenging because 1) the apparel/appearance of the same person will vary greatly over multiple days and 2) a person will leave and re-enter the scene numerous times. To tackle these challenges, we leverage face recognition information, which is robust to apparel change, to automatically reinitialize our tracker over multiple days of recordings. Unfortunately, recognized faces are unavailable oftentimes. Therefore, our tracker propagates identity information to frames without recognized faces by uncovering the appearance and spatial manifold formed by person detections. We tested our algorithm on a 23-day 15-camera data set (4,935 hours total), and we were able to localize a person 53.2% of the time with 69.8% precision. We further performed video summarization experiments based on our tracking output. Results on 116.25 hours of video showed that we were able to generate a reasonable visual diary (i.e. a summary of what a person did) for different people, thus potentially opening the door to automatic summarization of the vast amount of surveillance video generated every day.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes