CV LGDec 12, 2022

Synthetic Image Data for Deep Learning

Jason W. Anderson, Marcin Ziolkowski, Ken Kennedy, Amy W. Apon

arXiv:2212.06232v13.79 citationsh-index: 19

Originality Incremental advance

AI Analysis

This work addresses data scarcity in deep learning for specific domains like vehicle imaging, but it is incremental as it builds on existing synthetic data and augmentation techniques.

The paper tackled the problem of limited real training data for image classification and semantic segmentation by using synthetic images from 3D CAD models, finding that synthetic augmentation improved accuracy over real images alone and reduced transfer learning training costs by up to 90%.

Realistic synthetic image data rendered from 3D models can be used to augment image sets and train image classification semantic segmentation models. In this work, we explore how high quality physically-based rendering and domain randomization can efficiently create a large synthetic dataset based on production 3D CAD models of a real vehicle. We use this dataset to quantify the effectiveness of synthetic augmentation using U-net and Double-U-net models. We found that, for this domain, synthetic images were an effective technique for augmenting limited sets of real training data. We observed that models trained on purely synthetic images had a very low mean prediction IoU on real validation images. We also observed that adding even very small amounts of real images to a synthetic dataset greatly improved accuracy, and that models trained on datasets augmented with synthetic images were more accurate than those trained on real images alone. Finally, we found that in use cases that benefit from incremental training or model specialization, pretraining a base model on synthetic images provided a sizeable reduction in the training cost of transfer learning, allowing up to 90\% of the model training to be front-loaded.

View on arXiv PDF

Similar