CVApr 6, 2023

Therbligs in Action: Video Understanding through Motion Primitives

Eadom Dessalene, Michael Maynord, Cornelia Fermuller, Yiannis Aloimonos

arXiv:2304.03631v17.611 citationsh-index: 54

Originality Incremental advance

AI Analysis

This work addresses video action understanding for computer vision researchers by providing a novel representation that augments existing methods, though it is incremental as it builds on prior architectures.

The paper tackles video understanding by introducing Therbligs as motion primitives for a rule-based, compositional, and hierarchical action modeling, achieving average relative improvements of 10.5%/7.53%/6.5% on EPIC Kitchens and 8.9%/6.63%/4.8% on 50-Salads for action segmentation, anticipation, and recognition tasks.

In this paper we introduce a rule-based, compositional, and hierarchical modeling of action using Therbligs as our atoms. Introducing these atoms provides us with a consistent, expressive, contact-centered representation of action. Over the atoms we introduce a differentiable method of rule-based reasoning to regularize for logical consistency. Our approach is complementary to other approaches in that the Therblig-based representations produced by our architecture augment rather than replace existing architectures' representations. We release the first Therblig-centered annotations over two popular video datasets - EPIC Kitchens 100 and 50-Salads. We also broadly demonstrate benefits to adopting Therblig representations through evaluation on the following tasks: action segmentation, action anticipation, and action recognition - observing an average 10.5\%/7.53\%/6.5\% relative improvement, respectively, over EPIC Kitchens and an average 8.9\%/6.63\%/4.8\% relative improvement, respectively, over 50 Salads. Code and data will be made publicly available.

View on arXiv PDF

Similar