CVJun 19, 2023

Multi-Granularity Hand Action Detection

arXiv:2306.10858v22 citationsh-index: 44Has Code
Originality Incremental advance
AI Analysis

This work addresses the lack of fine-grained hand-action localization in video understanding, which is incremental as it builds on existing action detection methods with a new dataset and model.

The paper tackles the problem of detecting fine-grained hand actions in videos by introducing the FHA-Kitchens dataset with 2,377 clips and 880 action categories, and proposes MG-HAD, a method that improves multi-granularity detection through novel designs like Multi-dimensional Action Queries and Coarse-Fine Contrastive Denoising.

Detecting hand actions in videos is crucial for understanding video content and has diverse real-world applications. Existing approaches often focus on whole-body actions or coarse-grained action categories, lacking fine-grained hand-action localization information. To fill this gap, we introduce the FHA-Kitchens (Fine-Grained Hand Actions in Kitchen Scenes) dataset, providing both coarse- and fine-grained hand action categories along with localization annotations. This dataset comprises 2,377 video clips and 30,047 frames, annotated with approximately 200k bounding boxes and 880 action categories. Evaluation of existing action detection methods on FHA-Kitchens reveals varying generalization capabilities across different granularities. To handle multi-granularity in hand actions, we propose MG-HAD, an End-to-End Multi-Granularity Hand Action Detection method. It incorporates two new designs: Multi-dimensional Action Queries and Coarse-Fine Contrastive Denoising. Extensive experiments demonstrate MG-HAD's effectiveness for multi-granularity hand action detection, highlighting the significance of FHA-Kitchens for future research and real-world applications. The dataset and source code are available at https://github.com/superZ678/MG-HAD.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes