Explored An Effective Methodology for Fine-Grained Snake Recognition
This work addresses a domain-specific problem in computer vision for fine-grained snake identification, with incremental improvements in methodology.
The paper tackled fine-grained snake recognition by proposing a multimodal backbone, new loss functions, and joint self-supervised and supervised training, achieving a macro F1 score of 92.7% on a private dataset and 89.4% on a public dataset, securing first place in the SnakeCLEF2022 competition.
Fine-Grained Visual Classification (FGVC) is a longstanding and fundamental problem in computer vision and pattern recognition, and underpins a diverse set of real-world applications. This paper describes our contribution at SnakeCLEF2022 with FGVC. Firstly, we design a strong multimodal backbone to utilize various meta-information to assist in fine-grained identification. Secondly, we provide new loss functions to solve the long tail distribution with dataset. Then, in order to take full advantage of unlabeled datasets, we use self-supervised learning and supervised learning joint training to provide pre-trained model. Moreover, some effective data process tricks also are considered in our experiments. Last but not least, fine-tuned in downstream task with hard mining, ensambled kinds of model performance. Extensive experiments demonstrate that our method can effectively improve the performance of fine-grained recognition. Our method can achieve a macro f1 score 92.7% and 89.4% on private and public dataset, respectively, which is the 1st place among the participators on private leaderboard.