LG CV IV MLNov 2, 2019

On Modelling Label Uncertainty in Deep Neural Networks: Automatic Estimation of Intra-observer Variability in 2D Echocardiography Quality Assessment

Zhibin Liao, Hany Girgis, Amir Abdi, Hooman Vaseli, Jorden Hetherington, Robert Rohling, Ken Gin, Teresa Tsang, Purang Abolmaesumi

arXiv:1911.00674v17.753 citations

Originality Incremental advance

AI Analysis

This addresses reliability issues in deep learning for medical imaging, specifically in echocardiography quality assessment, though it is incremental as it builds on existing uncertainty modeling techniques.

The paper tackled the problem of label uncertainty due to intra-observer variability in 2D echocardiography quality assessment by proposing a method to model it as an aleatoric uncertainty regression problem, reducing absolute error from 0.11 ± 0.09 to 0.09 ± 0.08 with a 5.7% test accuracy improvement.

Uncertainty of labels in clinical data resulting from intra-observer variability can have direct impact on the reliability of assessments made by deep neural networks. In this paper, we propose a method for modelling such uncertainty in the context of 2D echocardiography (echo), which is a routine procedure for detecting cardiovascular disease at point-of-care. Echo imaging quality and acquisition time is highly dependent on the operator's experience level. Recent developments have shown the possibility of automating echo image quality quantification by mapping an expert's assessment of quality to the echo image via deep learning techniques. Nevertheless, the observer variability in the expert's assessment can impact the quality quantification accuracy. Here, we aim to model the intra-observer variability in echo quality assessment as an aleatoric uncertainty modelling regression problem with the introduction of a novel method that handles the regression problem with categorical labels. A key feature of our design is that only a single forward pass is sufficient to estimate the level of uncertainty for the network output. Compared to the $0.11 \pm 0.09$ absolute error (in a scale from 0 to 1) archived by the conventional regression method, the proposed method brings the error down to $0.09 \pm 0.08$, where the improvement is statistically significant and equivalents to $5.7\%$ test accuracy improvement. The simplicity of the proposed approach means that it could be generalized to other applications of deep learning in medical imaging, where there is often uncertainty in clinical labels.

View on arXiv PDF

Similar