CV AIApr 12, 2021

Learning from Subjective Ratings Using Auto-Decoded Deep Latent Embeddings

Bowen Li, Xinping Ren, Ke Yan, Le Lu, Lingyun Huang, Guotong Xie, Jing Xiao, Dar-In Tai, Adam P. Harrison

arXiv:2104.05570v31.4

Originality Incremental advance

AI Analysis

This addresses the fundamental issue of label subjectivity in medical imaging analysis, particularly for computer-aided diagnosis, though it is incremental as it builds on existing methods for handling annotator noise.

The paper tackled the problem of high variability in radiological diagnoses by introducing auto-decoded deep latent embeddings (ADDLE) to model rater-specific effects, resulting in a 10.5% improvement in partial AUCs for diagnosing severe liver steatosis over standard classifiers.

Depending on the application, radiological diagnoses can be associated with high inter- and intra-rater variabilities. Most computer-aided diagnosis (CAD) solutions treat such data as incontrovertible, exposing learning algorithms to considerable and possibly contradictory label noise and biases. Thus, managing subjectivity in labels is a fundamental problem in medical imaging analysis. To address this challenge, we introduce auto-decoded deep latent embeddings (ADDLE), which explicitly models the tendencies of each rater using an auto-decoder framework. After a simple linear transformation, the latent variables can be injected into any backbone at any and multiple points, allowing the model to account for rater-specific effects on the diagnosis. Importantly, ADDLE does not expect multiple raters per image in training, meaning it can readily learn from data mined from hospital archives. Moreover, the complexity of training ADDLE does not increase as more raters are added. During inference each rater can be simulated and a 'mean' or 'greedy' virtual rating can be produced. We test ADDLE on the problem of liver steatosis diagnosis from 2D ultrasound (US) by collecting 46 084 studies along with clinical US diagnoses originating from 65 different raters. We evaluated diagnostic performance using a separate dataset with gold-standard biopsy diagnoses. ADDLE can improve the partial areas under the curve (AUCs) for diagnosing severe steatosis by 10.5% over standard classifiers while outperforming other annotator-noise approaches, including those requiring 65 times the parameters.

View on arXiv PDF

Similar