From Synthetic to Real: Unsupervised Domain Adaptation for Animal Pose Estimation
This work addresses the problem of noisy pseudo labels in domain adaptation for animal pose estimation, which is important for researchers in computer vision and animal behavior analysis, but it is incremental as it builds on existing pseudo-label and domain adaptation techniques.
The paper tackles the lack of labeled data in animal pose estimation by proposing an unsupervised domain adaptation method to reduce the domain gap between synthetic and real data, achieving large-margin performance improvements on TigDog and VisDA 2019 datasets.
Animal pose estimation is an important field that has received increasing attention in the recent years. The main challenge for this task is the lack of labeled data. Existing works circumvent this problem with pseudo labels generated from data of other easily accessible domains such as synthetic data. However, these pseudo labels are noisy even with consistency check or confidence-based filtering due to the domain shift in the data. To solve this problem, we design a multi-scale domain adaptation module (MDAM) to reduce the domain gap between the synthetic and real data. We further introduce an online coarse-to-fine pseudo label updating strategy. Specifically, we propose a self-distillation module in an inner coarse-update loop and a mean-teacher in an outer fine-update loop to generate new pseudo labels that gradually replace the old ones. Consequently, our model is able to learn from the old pseudo labels at the early stage, and gradually switch to the new pseudo labels to prevent overfitting in the later stage. We evaluate our approach on the TigDog and VisDA 2019 datasets, where we outperform existing approaches by a large margin. We also demonstrate the generalization ability of our model by testing extensively on both unseen domains and unseen animal categories. Our code is available at the project website.