CVAug 1, 2024

Scaling Backwards: Minimal Synthetic Pre-training?

Ryo Nakamura, Ryu Tadokoro, Ryosuke Yamada, Yuki M. Asano, Iro Laina, Christian Rupprecht, Nakamasa Inoue, Rio Yokota, Hirokatsu Kataoka

arXiv:2408.00677v212.810 citationsh-index: 28Has Code

Originality Incremental advance

AI Analysis

This work addresses the data efficiency problem for computer vision researchers by demonstrating that minimal synthetic data can replace large real datasets, though it is incremental as it builds on existing pre-training paradigms.

The paper tackles the problem of whether large-scale real-world datasets are necessary for pre-training in computer vision by constructing a minimal synthetic dataset from a single fractal with perturbations, achieving performance on par with ImageNet-1k for full fine-tuning and showing that reducing synthetic images from 1k to 1 can increase pre-training performance.

Pre-training and transfer learning are an important building block of current computer vision systems. While pre-training is usually performed on large real-world image datasets, in this paper we ask whether this is truly necessary. To this end, we search for a minimal, purely synthetic pre-training dataset that allows us to achieve performance similar to the 1 million images of ImageNet-1k. We construct such a dataset from a single fractal with perturbations. With this, we contribute three main findings. (i) We show that pre-training is effective even with minimal synthetic images, with performance on par with large-scale pre-training datasets like ImageNet-1k for full fine-tuning. (ii) We investigate the single parameter with which we construct artificial categories for our dataset. We find that while the shape differences can be indistinguishable to humans, they are crucial for obtaining strong performances. (iii) Finally, we investigate the minimal requirements for successful pre-training. Surprisingly, we find that a substantial reduction of synthetic images from 1k to 1 can even lead to an increase in pre-training performance, a motivation to further investigate ''scaling backwards''. Finally, we extend our method from synthetic images to real images to see if a single real image can show similar pre-training effect through shape augmentation. We find that the use of grayscale images and affine transformations allows even real images to ''scale backwards''.

View on arXiv PDF Code

Similar