Deep Quality Estimation: Creating Surrogate Models for Human Quality Ratings
This work addresses the need for efficient quality estimation in medical imaging, particularly for clinical translation and dataset curation, though it is incremental as it applies existing methods to a new domain with scarce data.
The paper tackles the problem of approximating human quality ratings for segmentation tasks by training surrogate models, achieving prediction accuracy within a margin of error comparable to human intra-rater reliability on a glioma segmentation dataset with expert ratings.
Human ratings are abstract representations of segmentation quality. To approximate human quality ratings on scarce expert data, we train surrogate quality estimation models. We evaluate on a complex multi-class segmentation problem, specifically glioma segmentation, following the BraTS annotation protocol. The training data features quality ratings from 15 expert neuroradiologists on a scale ranging from 1 to 6 stars for various computer-generated and manual 3D annotations. Even though the networks operate on 2D images and with scarce training data, we can approximate segmentation quality within a margin of error comparable to human intra-rater reliability. Segmentation quality prediction has broad applications. While an understanding of segmentation quality is imperative for successful clinical translation of automatic segmentation quality algorithms, it can play an essential role in training new segmentation models. Due to the split-second inference times, it can be directly applied within a loss function or as a fully-automatic dataset curation mechanism in a federated learning setting.