CVAug 24, 2023

Benchmarking Data Efficiency and Computational Efficiency of Temporal Action Localization Models

arXiv:2308.13082v12 citationsh-index: 39
Originality Synthesis-oriented
AI Analysis

This work addresses efficiency challenges in temporal action localization for researchers and practitioners with limited data or computational resources, but it is incremental as it benchmarks existing models without introducing new methods.

This paper benchmarks the data and computational efficiency of temporal action localization models, finding that TemporalMaxer performs best with limited data and requires the least computational resources during inference, while recommending TriDet for limited training time.

In temporal action localization, given an input video, the goal is to predict which actions it contains, where they begin, and where they end. Training and testing current state-of-the-art deep learning models requires access to large amounts of data and computational power. However, gathering such data is challenging and computational resources might be limited. This work explores and measures how current deep temporal action localization models perform in settings constrained by the amount of data or computational power. We measure data efficiency by training each model on a subset of the training set. We find that TemporalMaxer outperforms other models in data-limited settings. Furthermore, we recommend TriDet when training time is limited. To test the efficiency of the models during inference, we pass videos of different lengths through each model. We find that TemporalMaxer requires the least computational resources, likely due to its simple architecture.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes