IVOct 5, 2023Code
How Good Are Synthetic Medical Images? An Empirical Study with Lung UltrasoundMenghan Yu, Sourabh Kulhare, Courosh Mehanian et al.
Acquiring large quantities of data and annotations is known to be effective for developing high-performing deep learning models, but is difficult and expensive to do in the healthcare context. Adding synthetic training data using generative models offers a low-cost method to deal effectively with the data scarcity challenge, and can also address data imbalance and patient privacy issues. In this study, we propose a comprehensive framework that fits seamlessly into model development workflows for medical image analysis. We demonstrate, with datasets of varying size, (i) the benefits of generative models as a data augmentation method; (ii) how adversarial methods can protect patient privacy via data substitution; (iii) novel performance metrics for these use cases by testing models on real holdout data. We show that training with both synthetic and real data outperforms training with real data alone, and that models trained solely with synthetic data approach their real-only counterparts. Code is available at https://github.com/Global-Health-Labs/US-DCGAN.
ETAug 23, 2018
Insect cyborgs: Bio-mimetic feature generators improve machine learning accuracy on limited dataCharles B Delahunt, J Nathan Kutz
Machine learning (ML) classifiers always benefit from more informative input features. We seek to auto-generate stronger feature sets in order to address the difficulty that ML methods often experience given limited training data. A wide range of biological neural nets (BNNs) excel at fast learning, implying that they are adept at extracting informative features. We can thus look to BNNs for tools to improve ML performance in this low-data regime. The insect olfactory network learns new odors very rapidly, by means of three key elements: A competitive inhibition layer; a high-dimensional sparse plastic layer; and Hebbian updates of synaptic weights. In this work, we deployed MothNet, a computational model of the insect olfactory network, as an automatic feature generator: Attached as a front-end pre-processor, its Readout Neurons provided new features, derived from the original features, for use by standard ML classifiers. We found that these "insect cyborgs", i.e. classifiers that are part-insect model and part-ML method, had significantly better performance than baseline ML methods alone on a vectorized MNIST dataset. The MothNet feature generator also substantially out-performed other feature generating methods such as PCA, PLS, and NNs, as well as pre-training to initialize NN weights. Cyborgs improved relative test set accuracy by an average of 6% to 33% depending on baseline ML accuracy, while relative reduction in test set error exceeded 50% for higher baseline accuracy ML models. These results indicate the potential value of BNN-inspired feature generators in the ML context.