CV LGAug 26, 2020

Synthetic Sample Selection via Reinforcement Learning

Jiarong Ye, Yuan Xue, L. Rodney Long, Sameer Antani, Zhiyun Xue, Keith Cheng, Xiaolei Huang

arXiv:2008.11331v18.525 citations

Originality Incremental advance

AI Analysis

This work addresses the shortage of annotated training data for medical image recognition systems, offering a general method to enhance performance with limited annotations, though it is incremental as it builds on existing synthetic image generation techniques.

The paper tackles the problem of quality control for synthetic medical images in data augmentation by proposing a reinforcement learning-based selection method to choose reliable synthetic images, resulting in classification performance improvements of 8.1% and 2.3% on two histopathology datasets.

Synthesizing realistic medical images provides a feasible solution to the shortage of training data in deep learning based medical image recognition systems. However, the quality control of synthetic images for data augmentation purposes is under-investigated, and some of the generated images are not realistic and may contain misleading features that distort data distribution when mixed with real images. Thus, the effectiveness of those synthetic images in medical image recognition systems cannot be guaranteed when they are being added randomly without quality assurance. In this work, we propose a reinforcement learning (RL) based synthetic sample selection method that learns to choose synthetic images containing reliable and informative features. A transformer based controller is trained via proximal policy optimization (PPO) using the validation classification accuracy as the reward. The selected images are mixed with the original training data for improved training of image recognition systems. To validate our method, we take the pathology image recognition as an example and conduct extensive experiments on two histopathology image datasets. In experiments on a cervical dataset and a lymph node dataset, the image classification performance is improved by 8.1% and 2.3%, respectively, when utilizing high-quality synthetic images selected by our RL framework. Our proposed synthetic sample selection method is general and has great potential to boost the performance of various medical image recognition systems given limited annotation.

View on arXiv PDF

Similar