3rd Place Solution for Google Universal Image Embedding
This is an incremental solution for image embedding tasks, specifically targeting competition performance.
The paper tackled the Google Universal Image Embedding Competition by using a ViT-H/14 backbone with ArcFace and a two-stage training approach, achieving a mean Precision @5 of 0.692 on the private leaderboard.
This paper presents the 3rd place solution to the Google Universal Image Embedding Competition on Kaggle. We use ViT-H/14 from OpenCLIP for the backbone of ArcFace, and trained in 2 stage. 1st stage is done with freezed backbone, and 2nd stage is whole model training. We achieve 0.692 mean Precision @5 on private leaderboard. Code available at https://github.com/YasumasaNamba/google-universal-image-embedding