CV LGAug 16, 2021

Semi-Supervised Siamese Network for Identifying Bad Data in Medical Imaging Datasets

Niamh Belton, Aonghus Lawlor, Kathleen M. Curran

arXiv:2108.07130v13.75 citations

Originality Incremental advance

AI Analysis

This addresses data quality issues in medical imaging for model developers, but it is incremental as it builds on existing Siamese network techniques.

The paper tackles the problem of identifying bad data in medical imaging datasets, which can negatively impact model performance, by proposing a semi-supervised Siamese network that achieves an AUC of 0.989 for detection.

Noisy data present in medical imaging datasets can often aid the development of robust models that are equipped to handle real-world data. However, if the bad data contains insufficient anatomical information, it can have a severe negative effect on the model's performance. We propose a novel methodology using a semi-supervised Siamese network to identify bad data. This method requires only a small pool of 'reference' medical images to be reviewed by a non-expert human to ensure the major anatomical structures are present in the Field of View. The model trains on this reference set and identifies bad data by using the Siamese network to compute the distance between the reference set and all other medical images in the dataset. This methodology achieves an Area Under the Curve (AUC) of 0.989 for identifying bad data. Code will be available at https://git.io/JYFuV.

View on arXiv PDF

Similar