5th Place Solution to Kaggle Google Universal Image Embedding Competition
This is an incremental solution for participants in image embedding competitions, offering a specific implementation that placed 5th.
The paper tackled the problem of image embedding for a Kaggle competition by using a CLIP ViT-H backbone with an ArcFace-trained head and TTA, achieving scores of 0.684 on public and 0.688 on private leaderboards.
In this paper, we present our solution, which placed 5th in the kaggle Google Universal Image Embedding Competition in 2022. We use the ViT-H visual encoder of CLIP from the openclip repository as a backbone and train a head model composed of BatchNormalization and Linear layers using ArcFace. The dataset used was a subset of products10K, GLDv2, GPR1200, and Food101. And applying TTA for part of images also improves the score. With this method, we achieve a score of 0.684 on the public and 0.688 on the private leaderboard. Our code is available. https://github.com/riron1206/kaggle-Google-Universal-Image-Embedding-Competition-5th-Place-Solution