IVCVJun 19, 2023

Deep Learning Framework with Multi-Head Dilated Encoders for Enhanced Segmentation of Cervical Cancer on Multiparametric Magnetic Resonance Imaging

arXiv:2306.11137v15 citationsh-index: 57
Originality Incremental advance
AI Analysis

This work addresses segmentation accuracy for cervical cancer diagnosis using MRI, but it is incremental as it builds on existing U-Net architectures with specific modifications for multimodal data.

The paper tackled the challenge of segmenting cervical cancer tumors from misaligned multiparametric MRI images by proposing a multi-head framework with dilated convolutions, achieving a median Dice coefficient of 0.823, which outperformed a baseline model (0.788) though not statistically significant.

T2-weighted magnetic resonance imaging (MRI) and diffusion-weighted imaging (DWI) are essential components for cervical cancer diagnosis. However, combining these channels for training deep learning models are challenging due to misalignment of images. Here, we propose a novel multi-head framework that uses dilated convolutions and shared residual connections for separate encoding of multiparametric MRI images. We employ a residual U-Net model as a baseline, and perform a series of architectural experiments to evaluate the tumor segmentation performance based on multiparametric input channels and feature encoding configurations. All experiments were performed using a cohort including 207 patients with locally advanced cervical cancer. Our proposed multi-head model using separate dilated encoding for T2W MRI, and combined b1000 DWI and apparent diffusion coefficient (ADC) images achieved the best median Dice coefficient similarity (DSC) score, 0.823 (95% confidence interval (CI), 0.595-0.797), outperforming the conventional multi-channel model, DSC 0.788 (95% CI, 0.568-0.776), although the difference was not statistically significant (p>0.05). We investigated channel sensitivity using 3D GRAD-CAM and channel dropout, and highlighted the critical importance of T2W and ADC channels for accurate tumor segmentations. However, our results showed that b1000 DWI had a minor impact on overall segmentation performance. We demonstrated that the use of separate dilated feature extractors and independent contextual learning improved the model's ability to reduce the boundary effects and distortion of DWI, leading to improved segmentation performance. Our findings can have significant implications for the development of robust and generalizable models that can extend to other multi-modal segmentation applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes