CVAISep 19, 2024

Interpretable Action Recognition on Hard to Classify Actions

arXiv:2409.13091v11 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses the challenge of distinguishing superficially similar actions in video recognition for applications requiring interpretability, but it is incremental as it builds on an existing model with specific extensions.

The paper tackled the problem of improving interpretable action recognition for hard-to-classify actions by adding 3D awareness to a model, resulting in a significant performance improvement from depth relations but no gain from a container detector.

We investigate a human-like interpretable model of video understanding. Humans recognise complex activities in video by recognising critical spatio-temporal relations among explicitly recognised objects and parts, for example, an object entering the aperture of a container. To mimic this we build on a model which uses positions of objects and hands, and their motions, to recognise the activity taking place. To improve this model we focussed on three of the most confused classes (for this model) and identified that the lack of 3D information was the major problem. To address this we extended our basic model by adding 3D awareness in two ways: (1) A state-of-the-art object detection model was fine-tuned to determine the difference between "Container" and "NotContainer" in order to integrate object shape information into the existing object features. (2) A state-of-the-art depth estimation model was used to extract depth values for individual objects and calculate depth relations to expand the existing relations used our interpretable model. These 3D extensions to our basic model were evaluated on a subset of three superficially similar "Putting" actions from the Something-Something-v2 dataset. The results showed that the container detector did not improve performance, but the addition of depth relations made a significant improvement to performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes