CVOct 14, 2022

3rd Place Solution for Google Universal Image Embedding

arXiv:2210.09296v12 citationsh-index: 3Has Code
Originality Synthesis-oriented
AI Analysis

This is an incremental solution for image embedding tasks, specifically targeting competition performance.

The paper tackled the Google Universal Image Embedding Competition by using a ViT-H/14 backbone with ArcFace and a two-stage training approach, achieving a mean Precision @5 of 0.692 on the private leaderboard.

This paper presents the 3rd place solution to the Google Universal Image Embedding Competition on Kaggle. We use ViT-H/14 from OpenCLIP for the backbone of ArcFace, and trained in 2 stage. 1st stage is done with freezed backbone, and 2nd stage is whole model training. We achieve 0.692 mean Precision @5 on private leaderboard. Code available at https://github.com/YasumasaNamba/google-universal-image-embedding

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes