CVHCNov 16, 2022

Egocentric Hand-object Interaction Detection

arXiv:2211.09067v12 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses a crucial task for egocentric human activity understanding, offering a more efficient solution compared to prior methods, though it is incremental in nature.

The paper tackles the problem of detecting hand-object interactions in egocentric videos by jointly analyzing hand pose and hand-object masks, achieving 89% accuracy on the EPIC-KITCHENS dataset and real-time performance at 30 FPS.

In this paper, we propose a method to jointly determine the status of hand-object interaction. This is crucial for egocentric human activity understanding and interaction. From a computer vision perspective, we believe that determining whether a hand is interacting with an object depends on whether there is an interactive hand pose and whether the hand is touching the object. Thus, we extract the hand pose, hand-object masks to jointly determine the interaction status. In order to solve the problem of hand pose estimation due to in-hand object occlusion, we use a multi-cam system to capture hand pose data from multiple perspectives. We evaluate and compare our method with the most recent work from Shan et al. \cite{Shan20} on selected images from EPIC-KITCHENS \cite{damen2018scaling} dataset and achieve $89\%$ accuracy on HOI (hand-object interaction) detection which is comparative to Shan's ($92\%$). However, for real-time performance, our method can run over $\textbf{30}$ FPS which is much more efficient than Shan's ($\textbf{1}\sim\textbf{2}$ FPS). A demo can be found from https://www.youtube.com/watch?v=XVj3zBuynmQ

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes