CVNov 20, 2019

Instance-Invariant Domain Adaptive Object Detection via Progressive Disentanglement

Aming Wu, Yahong Han, Linchao Zhu, Yi Yang

arXiv:1911.08712v413.38 citations

Originality Incremental advance

AI Analysis

This addresses domain adaptation for object detection, which is crucial for real-world applications where training and test data differ, but it appears incremental as it builds on existing disentanglement techniques.

The paper tackles the problem of poor generalization in object detection across different domains by proposing a progressive disentangled framework to extract instance-invariant features, achieving improvements of 2.3%, 3.6%, and 4.0% over a baseline method on three domain-shift scenes.

Most state-of-the-art methods of object detection suffer from poor generalization ability when the training and test data are from different domains, e.g., with different styles. To address this problem, previous methods mainly use holistic representations to align feature-level and pixel-level distributions of different domains, which may neglect the instance-level characteristics of objects in images. Besides, when transferring detection ability across different domains, it is important to obtain the instance-level features that are domain-invariant, instead of the styles that are domain-specific. Therefore, in order to extract instance-invariant features, we should disentangle the domain-invariant features from the domain-specific features. To this end, a progressive disentangled framework is first proposed to solve domain adaptive object detection. Particularly, base on disentangled learning used for feature decomposition, we devise two disentangled layers to decompose domain-invariant and domain-specific features. And the instance-invariant features are extracted based on the domain-invariant features. Finally, to enhance the disentanglement, a three-stage training mechanism including multiple loss functions is devised to optimize our model. In the experiment, we verify the effectiveness of our method on three domain-shift scenes. Our method is separately 2.3\%, 3.6\%, and 4.0\% higher than the baseline method \cite{saito2019strong}.

View on arXiv PDF

Similar