CV AIAug 15, 2023

Action Class Relation Detection and Classification Across Multiple Video Datasets

Yuya Yoshikawa, Yutaro Shigeto, Masashi Shimbo, Akikazu Takeuchi

arXiv:2308.07558v11.52 citationsh-index: 21

Originality Incremental advance

AI Analysis

This work addresses the need for dataset augmentation in human action recognition by enabling relation detection for external datasets, though it is incremental as it builds on existing pre-trained models and the MetaVD framework.

The paper tackles the problem of predicting relations between action classes across video datasets to enable dataset augmentation, proposing a unified model that uses language and visual information, with results showing that text-based predictions are more accurate than video-based ones and blending both modalities can improve performance.

The Meta Video Dataset (MetaVD) provides annotated relations between action classes in major datasets for human action recognition in videos. Although these annotated relations enable dataset augmentation, it is only applicable to those covered by MetaVD. For an external dataset to enjoy the same benefit, the relations between its action classes and those in MetaVD need to be determined. To address this issue, we consider two new machine learning tasks: action class relation detection and classification. We propose a unified model to predict relations between action classes, using language and visual information associated with classes. Experimental results show that (i) pre-trained recent neural network models for texts and videos contribute to high predictive performance, (ii) the relation prediction based on action label texts is more accurate than based on videos, and (iii) a blending approach that combines predictions by both modalities can further improve the predictive performance in some cases.

View on arXiv PDF

Similar