QK Iteration: A Self-Supervised Representation Learning Algorithm for Image Similarity
This work addresses the challenge of efficient contrastive learning for image copy detection, an incremental improvement over existing methods in a specific domain.
The paper tackles the problem of self-supervised representation learning for image similarity, specifically in copy detection, by proposing QK Iteration, a contrastive algorithm that pushes against a large number of negative examples. The result is a significant improvement in micro-AP scores, e.g., from a baseline of 0.1556 to 0.3401 on a competition leaderboard.
Self-supervised representation learning is a fundamental problem in computer vision with many useful applications (e.g., image search, instance level recognition, copy detection). In this paper we present a new contrastive self-supervised representation learning algorithm in the context of Copy Detection in the 2021 Image Similarity Challenge hosted by Facebook AI Research. Previous work in contrastive self-supervised learning has identified the importance of being able to optimize representations while ``pushing'' against a large number of negative examples. Representative previous solutions either use large batches enabled by modern distributed training systems or maintain queues or memory banks holding recently evaluated representations while relaxing some consistency properties. We approach this problem from a new angle: We directly learn a query model and a key model jointly and push representations against a very large number (e.g., 1 million) of negative representations in each SGD step. We achieve this by freezing the backbone on one side and by alternating between a Q-optimization step and a K-optimization step. During the competition timeframe, our algorithms achieved a micro-AP score of 0.3401 on the Phase 1 leaderboard, significantly improving over the baseline $μ$AP of 0.1556. On the final Phase 2 leaderboard, our model scored 0.1919, while the baseline scored 0.0526. Continued training yielded further improvement. We conducted an empirical study to compare the proposed approach with a SimCLR style strategy where the negative examples are taken from the batch only. We found that our method ($μ$AP of 0.3403) significantly outperforms this SimCLR-style baseline ($μ$AP of 0.2001).