CVJun 10, 2019

The role of ego vision in view-invariant action recognition

arXiv:1906.03918v1
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of processing egocentric video data for researchers in computer vision, but it is incremental as it builds on existing transfer learning methods.

The study tackled the problem of understanding how egocentric (first-person) video differs from third-person video for action recognition, and demonstrated through transfer learning that Convolutional Neural Networks can implicitly learn view-invariant representations, though with limitations.

Analysis and interpretation of egocentric video data is becoming more and more important with the increasing availability and use of wearable cameras. Exploring and fully understanding affinities and differences between ego and allo (or third-person) vision is paramount for the design of effective methods to process, analyse and interpret egocentric data. In addition, a deeper understanding of ego-vision and its peculiarities may enable new research perspectives in which first person viewpoints can act either as a mean for easily acquiring large amounts of data to be employed in general-purpose recognition systems, and as a challenging test-bed to assess the usability of techniques specifically tailored to deal with allocentric vision on more challenging settings. Our work, with an eye to cognitive science findings, leverages transfer learning in Convolutional Neural Networks to demonstrate capabilities and limitations of an implicitly learnt view-invariant representation in the specific case of action recognition.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes