CVMay 12, 2016

Going Deeper into First-Person Activity Recognition

arXiv:1605.03688v1316 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of egocentric action recognition for applications like wearable computing, but it is incremental as it builds on existing feature design ideas.

The paper tackled first-person activity recognition by proposing a twin-stream CNN architecture that integrates appearance and motion information, achieving a 6.6% average accuracy increase over state-of-the-art methods on benchmark datasets.

We bring together ideas from recent work on feature design for egocentric action recognition under one framework by exploring the use of deep convolutional neural networks (CNN). Recent work has shown that features such as hand appearance, object attributes, local hand motion and camera ego-motion are important for characterizing first-person actions. To integrate these ideas under one framework, we propose a twin stream network architecture, where one stream analyzes appearance information and the other stream analyzes motion information. Our appearance stream encodes prior knowledge of the egocentric paradigm by explicitly training the network to segment hands and localize objects. By visualizing certain neuron activation of our network, we show that our proposed architecture naturally learns features that capture object attributes and hand-object configurations. Our extensive experiments on benchmark egocentric action datasets show that our deep architecture enables recognition rates that significantly outperform state-of-the-art techniques -- an average $6.6\%$ increase in accuracy over all datasets. Furthermore, by learning to recognize objects, actions and activities jointly, the performance of individual recognition tasks also increase by $30\%$ (actions) and $14\%$ (objects). We also include the results of extensive ablative analysis to highlight the importance of network design decisions..

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes