CVAIAug 17, 2021

Channel-Temporal Attention for First-Person Video Domain Adaptation

arXiv:2108.07846v2
Originality Synthesis-oriented
AI Analysis

This addresses a domain-specific problem for first-person video analysis, with incremental contributions in dataset creation and model adaptation.

The paper tackles unsupervised domain adaptation for first-person action recognition by proposing two new datasets and a Channel-Temporal Attention Network (CTAN) that outperforms baselines on these and an existing dataset.

Unsupervised Domain Adaptation (UDA) can transfer knowledge from labeled source data to unlabeled target data of the same categories. However, UDA for first-person action recognition is an under-explored problem, with lack of datasets and limited consideration of first-person video characteristics. This paper focuses on addressing this problem. Firstly, we propose two small-scale first-person video domain adaptation datasets: ADL$_{small}$ and GTEA-KITCHEN. Secondly, we introduce channel-temporal attention blocks to capture the channel-wise and temporal-wise relationships and model their inter-dependencies important to first-person vision. Finally, we propose a Channel-Temporal Attention Network (CTAN) to integrate these blocks into existing architectures. CTAN outperforms baselines on the two proposed datasets and one existing dataset EPIC$_{cvpr20}$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes