CVJun 15, 2021

Domain Adaptive SiamRPN++ for Object Tracking in the Wild

arXiv:2106.07862v18 citations
Originality Incremental advance
AI Analysis

This addresses a domain adaptation gap in visual tracking for real-world applications like autonomous driving, but it is incremental as it builds on the existing SiamRPN++ method.

The paper tackles the problem of domain shift in visual object tracking, where trackers trained on normal images perform poorly on foggy or rainy sequences, by introducing Domain Adaptive SiamRPN++ (DASiamRPN++) with pixel and semantic adaptation modules, achieving improved cross-domain transferability and robustness as demonstrated on synthetic foggy and TIR datasets.

Benefit from large-scale training data, recent advances in Siamese-based object tracking have achieved compelling results on the normal sequences. Whilst Siamese-based trackers assume training and test data follow an identical distribution. Suppose there is a set of foggy or rainy test sequences, it cannot be guaranteed that the trackers trained on the normal images perform well on the data belonging to other domains. The problem of domain shift among training and test data has already been discussed in object detection and semantic segmentation areas, which, however, has not been investigated for visual tracking. To this end, based on SiamRPN++, we introduce a Domain Adaptive SiamRPN++, namely DASiamRPN++, to improve the cross-domain transferability and robustness of a tracker. Inspired by A-distance theory, we present two domain adaptive modules, Pixel Domain Adaptation (PDA) and Semantic Domain Adaptation (SDA). The PDA module aligns the feature maps of template and search region images to eliminate the pixel-level domain shift caused by weather, illumination, etc. The SDA module aligns the feature representations of the tracking target's appearance to eliminate the semantic-level domain shift. PDA and SDA modules reduce the domain disparity by learning domain classifiers in an adversarial training manner. The domain classifiers enforce the network to learn domain-invariant feature representations. Extensive experiments are performed on the standard datasets of two different domains, including synthetic foggy and TIR sequences, which demonstrate the transferability and domain adaptability of the proposed tracker.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes