Importance of realism in procedurally-generated synthetic images for deep learning: case studies in maize and canola
This work addresses the challenge of data scarcity in agricultural deep learning by demonstrating the importance of realism in synthetic data, though it is incremental as it builds on existing L-systems and synthetic data methods.
The paper tackled the problem of expensive annotated image acquisition for training neural networks in crop plant phenotyping by using procedurally-generated synthetic images from L-systems, finding that increasing realism in synthetic images drastically improved prediction results on real images, with specific gains observed in maize and canola case studies.
Artificial neural networks are often used to identify features of crop plants. However, training their models requires many annotated images, which can be expensive and time-consuming to acquire. Procedural models of plants, such as those developed with Lindenmayer-systems (L-systems) can be created to produce visually realistic simulations, and hence images of plant simulations, where annotations are implicitly known. These synthetic images can either augment or completely replace real images in training neural networks for phenotyping tasks. In this paper, we systematically vary amounts of real and synthetic images used for training in both maize and canola to better understand situations where synthetic images generated from L-systems can help prediction on real images. This work also explores the degree to which realism in the synthetic images improves prediction. We have five different variants of a procedural canola model (these variants were created by tuning the realism while using calibration), and the deep learning results showed how drastically these results improve as the canola synthetic images are made to be more realistic. Furthermore, we see how neural network predictions can be used to help calibrate L-systems themselves, creating a feedback loop.