VideoDG: Generalizing Temporal Relations in Videos to Novel Domains
This addresses the challenge of generalizing video classification to novel domains, which is incremental as it builds on existing domain generalization methods but introduces specific techniques for temporal relations.
The paper tackles the problem of video domain generalization, where video classification networks fail on unseen domains due to temporal domain shifts, by proposing the VideoDG framework that learns local-relation features for better generalizability and uses adversarial data augmentation to bridge domains, achieving consistent outperformance over previous methods on three benchmarks.
This paper introduces video domain generalization where most video classification networks degenerate due to the lack of exposure to the target domains of divergent distributions. We observe that the global temporal features are less generalizable, due to the temporal domain shift that videos from other unseen domains may have an unexpected absence or misalignment of the temporal relations. This finding has motivated us to solve video domain generalization by effectively learning the local-relation features of different timescales that are more generalizable, and exploiting them along with the global-relation features to maintain the discriminability. This paper presents the VideoDG framework with two technical contributions. The first is a new deep architecture named the Adversarial Pyramid Network, which improves the generalizability of video features by capturing the local-relation, global-relation, and cross-relation features progressively. On the basis of pyramid features, the second contribution is a new and robust approach of adversarial data augmentation that can bridge different video domains by improving the diversity and quality of augmented data. We construct three video domain generalization benchmarks in which domains are divided according to different datasets, different consequences of actions, or different camera views, respectively. VideoDG consistently outperforms the combinations of previous video classification models and existing domain generalization methods on all benchmarks.