CVAug 15, 2016

Depth2Action: Exploring Embedded Depth for Large-Scale Action Recognition

arXiv:1608.04339v143 citations
Originality Incremental advance
AI Analysis

This work addresses action recognition for video analysis, but it is incremental as it builds on existing methods by adding depth information.

The paper tackles the problem of large-scale human action recognition in video by using depth cues estimated from the videos themselves, resulting in state-of-the-art performance on three benchmarks when combined with appearance and motion information.

This paper performs the first investigation into depth for large-scale human action recognition in video where the depth cues are estimated from the videos themselves. We develop a new framework called depth2action and experiment thoroughly into how best to incorporate the depth information. We introduce spatio-temporal depth normalization (STDN) to enforce temporal consistency in our estimated depth sequences. We also propose modified depth motion maps (MDMM) to capture the subtle temporal changes in depth. These two components significantly improve the action recognition performance. We evaluate our depth2action framework on three large-scale action recognition video benchmarks. Our model achieves state-of-the-art performance when combined with appearance and motion information thus demonstrating that depth2action is indeed complementary to existing approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes