CVAug 30, 2022

A Circular Window-based Cascade Transformer for Online Action Detection

arXiv:2208.14209v16 citationsh-index: 30
Originality Incremental advance
AI Analysis

This work addresses real-time action prediction in streaming videos, offering an incremental improvement in efficiency and accuracy for video analysis applications.

The paper tackles the problem of online action detection by proposing a circular window-based cascade Transformer that efficiently updates only the latest and oldest historical representations while reusing intermediate ones, achieving state-of-the-art performance on THUMOS'14, TVSeries, and HDD datasets.

Online action detection aims at the accurate action prediction of the current frame based on long historical observations. Meanwhile, it demands real-time inference on online streaming videos. In this paper, we advocate a novel and efficient principle for online action detection. It merely updates the latest and oldest historical representations in one window but reuses the intermediate ones, which have been already computed. Based on this principle, we introduce a window-based cascade Transformer with a circular historical queue, where it conducts multi-stage attentions and cascade refinement on each window. We also explore the association between online action detection and its counterpart offline action segmentation as an auxiliary task. We find that such an extra supervision helps discriminative history clustering and acts as feature augmentation for better training the classifier and cascade refinement. Our proposed method achieves the state-of-the-art performances on three challenging datasets THUMOS'14, TVSeries, and HDD. Codes will be available after acceptance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes