CVCLMay 9, 2023

Getting More Juice Out of Your Data: Hard Pair Refinement Enhances Visual-Language Models Without Extra Data

arXiv:2305.05208v313 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the resource and time costs of enhancing CLIP models for researchers and practitioners, offering a cost-effective, incremental improvement.

The paper tackles the problem of improving visual-language models like CLIP without requiring additional data or extensive retraining by introducing HELIP, a strategy that refines challenging text-image pairs during continuous training. The result is significant performance gains, such as boosting zero-shot classification accuracy on ImageNet by up to 10.1% and improving fine-grained classification by an average of up to 18.6%.

Contrastive Language-Image Pre-training (CLIP) has become the standard for cross-modal image-text representation learning. Improving CLIP typically requires additional data and retraining with new loss functions, but these demands raise resource and time costs, limiting practical use. In this work, we introduce HELIP, a cost-effective strategy that improves CLIP models by exploiting challenging text-image pairs within existing datasets in continuous training. This eliminates the need for additional data or extensive retraining. Moreover, HELIP integrates effortlessly into current training pipelines with minimal code modifications, allowing for quick and seamless implementation. On comprehensive benchmarks, HELIP consistently boosts existing models. In particular, within just two epochs of training, it improves zero-shot classification accuracy on ImageNet for SLIP models pre-trained on CC3M, CC12M, and YFCC15M datasets by 3.05%, 4.47%, and 10.1% , respectively. In addition, on fine-grained classification datasets, HELIP improves the zero-shot performance of CLIP and SLIP by an average of 8.4% and 18.6%, and their linear probe performance by an average of 9.5% and 3.0%. The code is publicly available at: https://github.com/haonan3/HELIP-NACCL-2025.git.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes