CVOct 4, 2021

How You Move Your Head Tells What You Do: Self-supervised Video Representation Learning with Egocentric Cameras and IMU Sensors

arXiv:2110.01680v15 citations
Originality Incremental advance
AI Analysis

This addresses the need for more efficient and scalable activity recognition in AR/VR applications, but it is incremental as it builds on existing self-supervised learning methods with a novel data pairing approach.

The paper tackled the problem of expensive human annotation and limited label coverage in understanding user activities from head-mounted cameras for AR/VR applications by proposing a self-supervised learning approach that pairs video clips with head-motion data from IMU sensors. The result demonstrated effectiveness in recognizing egocentric activities for people and dogs, though no concrete numbers were provided.

Understanding users' activities from head-mounted cameras is a fundamental task for Augmented and Virtual Reality (AR/VR) applications. A typical approach is to train a classifier in a supervised manner using data labeled by humans. This approach has limitations due to the expensive annotation cost and the closed coverage of activity labels. A potential way to address these limitations is to use self-supervised learning (SSL). Instead of relying on human annotations, SSL leverages intrinsic properties of data to learn representations. We are particularly interested in learning egocentric video representations benefiting from the head-motion generated by users' daily activities, which can be easily obtained from IMU sensors embedded in AR/VR devices. Towards this goal, we propose a simple but effective approach to learn video representation by learning to tell the corresponding pairs of video clip and head-motion. We demonstrate the effectiveness of our learned representation for recognizing egocentric activities of people and dogs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes