MMSep 29, 2017

Impact of Three-Dimensional Video Scalability on Multi-View Activity Recognition using Deep Learning

arXiv:1709.10206v14 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of efficient activity recognition in resource-limited environments for applications like video surveillance or streaming, though it is incremental as it applies existing methods to a new scalability context.

The paper investigates how reducing video quality through three-dimensional scalability affects multi-view activity recognition, finding that deep learning methods achieve significantly improved robustness compared to feature-based methods and identifying optimal scalability combinations for resource-constrained systems.

Human activity recognition is one of the important research topics in computer vision and video understanding. It is often assumed that high quality video sequences are available for recognition. However, relaxing such a requirement and implementing robust recognition using videos having reduced data rates can achieve efficiency in storing and transmitting video data. Three-dimensional video scalability, which refers to the possibility of reducing spatial, temporal, and quality resolutions of videos, is an effective way for flexible representation and management of video data. In this paper, we investigate the impact of the video scalability on multi-view activity recognition. We employ both a spatiotemporal feature extraction-based method and a deep learning-based method using convolutional and recurrent neural networks. The recognition performance of the two methods is examined, along with in-depth analysis regarding how their performance vary with respect to various scalability combinations. In particular, we demonstrate that the deep learning-based method can achieve significantly improved robustness in comparison to the feature-based method. Furthermore, we investigate optimal scalability combinations with respect to bitrate in order to provide useful guidelines for an optimal operation policy in resource-constrained activity recognition systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes