Data augmentation in microscopic images for material data mining
This addresses the problem of costly data collection for material science researchers, offering an incremental improvement in data augmentation methods for microscopic image segmentation.
The paper tackles the high cost of collecting experimental data for material data mining by developing a transfer learning strategy that fuses real and simulated images to generate synthetic training data. The result shows that using synthetic images plus 35% of real images outperforms training on all real images, reducing real data preparation time by roughly 65%.
Recent progress in material data mining has been driven by high-capacity models trained on large datasets. However, collecting experimental data (real data) has been extremely costly since the amount of human effort and expertise required. Here, we develop a novel transfer learning strategy to address small or insufficient data problem. This strategy realizes the fusion of real and simulated data, and the augmentation of training data in data mining procedure. For a specific task of image segmentation, this strategy can generate synthetic images by fusing physical mechanism of simulated images and "image style" of real images. The result shows that the model trained with the acquired synthetic images and 35% of the real images outperforms the model trained on all real images. As the time required to generate synthetic data is almost negligible, this strategy is able to reduce the time cost of real data preparation by roughly 65%.