CV LGNov 29, 2022

Procedural Image Programs for Representation Learning

Manel Baradad, Chun-Fu Chen, Jonas Wulff, Tongzhou Wang, Rogerio Feris, Antonio Torralba, Phillip Isola

MIT

arXiv:2211.16412v217.638 citationsh-index: 140Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of limited scalability and expert dependency in synthetic data generation for machine learning researchers, though it is incremental in improving existing methods.

The paper tackles the challenge of scaling up synthetic image generation for representation learning by proposing a dataset of 21,000 procedural programs, which reduces the performance gap between pre-training with real and procedurally generated images by 38%.

Learning image representations using synthetic data allows training neural networks without some of the concerns associated with real images, such as privacy and bias. Existing work focuses on a handful of curated generative processes which require expert knowledge to design, making it hard to scale up. To overcome this, we propose training with a large dataset of twenty-one thousand programs, each one generating a diverse set of synthetic images. These programs are short code snippets, which are easy to modify and fast to execute using OpenGL. The proposed dataset can be used for both supervised and unsupervised representation learning, and reduces the gap between pre-training with real and procedurally generated images by 38%.

View on arXiv PDF Code

Similar