Rapid Training Data Creation by Synthesizing Medical Images for Classification and Localization
This addresses the problem of limited annotated medical data for researchers and practitioners, offering a practical solution to reduce annotation burdens, though it is incremental as it builds on existing data synthesis techniques.
The paper tackles the high cost and difficulty of obtaining exhaustively annotated medical images for AI training by presenting a method to synthesize training data from real images, showing that it significantly increases localization accuracy for weakly supervised models and matches the accuracy of exhaustively annotated real data for strongly supervised models on human urine microscopy images.
While the use of artificial intelligence (AI) for medical image analysis is gaining wide acceptance, the expertise, time and cost required to generate annotated data in the medical field are significantly high, due to limited availability of both data and expert annotation. Strongly supervised object localization models require data that is exhaustively annotated, meaning all objects of interest in an image are identified. This is difficult to achieve and verify for medical images. We present a method for the transformation of real data to train any Deep Neural Network to solve the above problems. We show the efficacy of this approach on both a weakly supervised localization model and a strongly supervised localization model. For the weakly supervised model, we show that the localization accuracy increases significantly using the generated data. For the strongly supervised model, this approach overcomes the need for exhaustive annotation on real images. In the latter model, we show that the accuracy, when trained with generated images, closely parallels the accuracy when trained with exhaustively annotated real images. The results are demonstrated on images of human urine samples obtained using microscopy.