CVJan 13, 2022

Hand-Object Interaction Reasoning

arXiv:2201.04906v16.58 citations

Originality Incremental advance

AI Analysis

This work addresses action recognition in egocentric video, which is incremental as it builds on existing methods by focusing on two-handed interactions.

The paper tackles the problem of modeling spatio-temporal relationships between hands and objects in video for action recognition, showing that modeling two-handed interactions improves performance on datasets like EPIC-KITCHENS and Something-Else.

This paper proposes an interaction reasoning network for modelling spatio-temporal relationships between hands and objects in video. The proposed interaction unit utilises a Transformer module to reason about each acting hand, and its spatio-temporal relation to the other hand as well as objects being interacted with. We show that modelling two-handed interactions are critical for action recognition in egocentric video, and demonstrate that by using positionally-encoded trajectories, the network can better recognise observed interactions. We evaluate our proposal on EPIC-KITCHENS and Something-Else datasets, with an ablation study.

View on arXiv PDF

Similar