CVMay 11, 2023

An Inverse Scaling Law for CLIP Training

arXiv:2305.07017v286 citationsHas Code
Originality Incremental advance
AI Analysis

This reduces the computational barrier for CLIP training, making it more accessible for academic research, though it is incremental as it builds on existing CLIP methods.

The paper tackles the high computational cost of training CLIP models by discovering an inverse scaling law where larger encoders allow shorter token sequences, enabling efficient training with limited resources. For example, using 8 A100 GPUs, their method achieves zero-shot ImageNet-1k accuracies up to 69.3% in ~4 days and sets a new record of 83.0% with G/14 while accelerating training by ~33x.

CLIP, one of the pioneering foundation models that connect images and text, has enabled many recent breakthroughs in computer vision. However, its associated training cost is prohibitively high, imposing a significant barrier to its widespread exploration. In this paper, we present a surprising finding that there exists an inverse scaling law for CLIP training, whereby the larger the image/text encoders used, the shorter the sequence length of image/text tokens that can be applied in training. Moreover, we showcase that the strategy for reducing image/text token length plays a crucial role in determining the quality of this scaling law. As a result of this finding, we are able to successfully train CLIP even with limited computational resources. For example, using 8 A100 GPUs, our CLIP models achieve zero-shot top-1 ImageNet-1k accuracies of 63.2% in ~2 days, 67.8% in ~3 days, and 69.3% in ~4 days. Our method also works well when scaling up -- with G/14, we register a new record of 83.0% ImageNet-1k zero-shot accuracy, and meanwhile accelerate the training by ~33x compared to its OpenCLIP counterpart. By reducing the computation barrier associated with CLIP, we hope to inspire more research in this field, particularly from academics. Our code is available at https://github.com/UCSC-VLAA/CLIPA.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes