CVOct 18, 2022

5th Place Solution to Kaggle Google Universal Image Embedding Competition

arXiv:2210.09495v13 citationsh-index: 3Has Code
Originality Synthesis-oriented
AI Analysis

This is an incremental solution for participants in image embedding competitions, offering a specific implementation that placed 5th.

The paper tackled the problem of image embedding for a Kaggle competition by using a CLIP ViT-H backbone with an ArcFace-trained head and TTA, achieving scores of 0.684 on public and 0.688 on private leaderboards.

In this paper, we present our solution, which placed 5th in the kaggle Google Universal Image Embedding Competition in 2022. We use the ViT-H visual encoder of CLIP from the openclip repository as a backbone and train a head model composed of BatchNormalization and Linear layers using ArcFace. The dataset used was a subset of products10K, GLDv2, GPR1200, and Food101. And applying TTA for part of images also improves the score. With this method, we achieve a score of 0.684 on the public and 0.688 on the private leaderboard. Our code is available. https://github.com/riron1206/kaggle-Google-Universal-Image-Embedding-Competition-5th-Place-Solution

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes