CVMar 30, 2022

CycDA: Unsupervised Cycle Domain Adaptation from Image to Video

arXiv:2203.16244v311 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the high cost of video data annotation for action recognition by enabling adaptation from images, though it is incremental as it builds on existing domain adaptation methods.

The paper tackles the problem of adapting action recognition from unlabeled web images to unlabeled videos by addressing spatial domain shift and modality gaps, achieving state-of-the-art results on benchmark datasets.

Although action recognition has achieved impressive results over recent years, both collection and annotation of video training data are still time-consuming and cost intensive. Therefore, image-to-video adaptation has been proposed to exploit labeling-free web image source for adapting on unlabeled target videos. This poses two major challenges: (1) spatial domain shift between web images and video frames; (2) modality gap between image and video data. To address these challenges, we propose Cycle Domain Adaptation (CycDA), a cycle-based approach for unsupervised image-to-video domain adaptation by leveraging the joint spatial information in images and videos on the one hand and, on the other hand, training an independent spatio-temporal model to bridge the modality gap. We alternate between the spatial and spatio-temporal learning with knowledge transfer between the two in each cycle. We evaluate our approach on benchmark datasets for image-to-video as well as for mixed-source domain adaptation achieving state-of-the-art results and demonstrating the benefits of our cyclic adaptation. Code is available at \url{https://github.com/wlin-at/CycDA}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes