CVOct 20, 2022

Transformer-based Action recognition in hand-object interacting scenarios

arXiv:2210.11387v12 citationsh-index: 15
Originality Synthesis-oriented
AI Analysis

This work addresses action recognition in hand-object interaction scenarios for computer vision applications, but it is incremental as it builds on existing Transformer methods.

The paper tackled hand-object interaction recognition in egocentric views by proposing a Transformer-based keypoint estimation framework, achieving a top-1 accuracy of 87.19% on a testset.

This report describes the 2nd place solution to the ECCV 2022 Human Body, Hands, and Activities (HBHA) from Egocentric and Multi-view Cameras Challenge: Action Recognition. This challenge aims to recognize hand-object interaction in an egocentric view. We propose a framework that estimates keypoints of two hands and an object with a Transformer-based keypoint estimator and recognizes actions based on the estimated keypoints. We achieved a top-1 accuracy of 87.19% on the testset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes