CV AIApr 15, 2019

A deep learning framework for quality assessment and restoration in video endoscopy

Sharib Ali, Felix Zhou, Adam Bailey, Barbara Braden, James East, Xin Lu, Jens Rittscher

arXiv:1904.07073v114.4140 citations

Originality Incremental advance

AI Analysis

This addresses a fundamental medical imaging problem for clinical applications by providing a comprehensive solution for artifact handling in endoscopy, though it is incremental as it builds on existing detection and restoration methods.

The paper tackles the problem of multiple artifacts in endoscopy videos by proposing a deep learning framework for detection, quality assessment, and restoration, achieving a mean average precision of 49.0 and preserving 68.7% of frames, which is 25% more than raw videos.

Endoscopy is a routine imaging technique used for both diagnosis and minimally invasive surgical treatment. Artifacts such as motion blur, bubbles, specular reflections, floating objects and pixel saturation impede the visual interpretation and the automated analysis of endoscopy videos. Given the widespread use of endoscopy in different clinical applications, we contend that the robust and reliable identification of such artifacts and the automated restoration of corrupted video frames is a fundamental medical imaging problem. Existing state-of-the-art methods only deal with the detection and restoration of selected artifacts. However, typically endoscopy videos contain numerous artifacts which motivates to establish a comprehensive solution. We propose a fully automatic framework that can: 1) detect and classify six different primary artifacts, 2) provide a quality score for each frame and 3) restore mildly corrupted frames. To detect different artifacts our framework exploits fast multi-scale, single stage convolutional neural network detector. We introduce a quality metric to assess frame quality and predict image restoration success. Generative adversarial networks with carefully chosen regularization are finally used to restore corrupted frames. Our detector yields the highest mean average precision (mAP at 5% threshold) of 49.0 and the lowest computational time of 88 ms allowing for accurate real-time processing. Our restoration models for blind deblurring, saturation correction and inpainting demonstrate significant improvements over previous methods. On a set of 10 test videos we show that our approach preserves an average of 68.7% which is 25% more frames than that retained from the raw videos.

View on arXiv PDF

Similar