CVApr 6, 2023

Therbligs in Action: Video Understanding through Motion Primitives

arXiv:2304.03631v111 citationsh-index: 54
Originality Incremental advance
AI Analysis

This work addresses video action understanding for computer vision researchers by providing a novel representation that augments existing methods, though it is incremental as it builds on prior architectures.

The paper tackles video understanding by introducing Therbligs as motion primitives for a rule-based, compositional, and hierarchical action modeling, achieving average relative improvements of 10.5%/7.53%/6.5% on EPIC Kitchens and 8.9%/6.63%/4.8% on 50-Salads for action segmentation, anticipation, and recognition tasks.

In this paper we introduce a rule-based, compositional, and hierarchical modeling of action using Therbligs as our atoms. Introducing these atoms provides us with a consistent, expressive, contact-centered representation of action. Over the atoms we introduce a differentiable method of rule-based reasoning to regularize for logical consistency. Our approach is complementary to other approaches in that the Therblig-based representations produced by our architecture augment rather than replace existing architectures' representations. We release the first Therblig-centered annotations over two popular video datasets - EPIC Kitchens 100 and 50-Salads. We also broadly demonstrate benefits to adopting Therblig representations through evaluation on the following tasks: action segmentation, action anticipation, and action recognition - observing an average 10.5\%/7.53\%/6.5\% relative improvement, respectively, over EPIC Kitchens and an average 8.9\%/6.63\%/4.8\% relative improvement, respectively, over 50 Salads. Code and data will be made publicly available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes