Sharp U-Net: Depthwise Convolutional Network for Biomedical Image Segmentation
This addresses segmentation accuracy issues in biomedical imaging, offering a parameter-efficient improvement over existing methods, though it is incremental as it builds directly on U-Net.
The paper tackles the problem of blurred feature maps and segmentation errors in U-Net for biomedical image segmentation by proposing Sharp U-Net, which uses depthwise convolution with a sharpening kernel to improve feature fusion, resulting in consistent outperformance or matching of state-of-the-art baselines on six datasets with no extra parameters.
The U-Net architecture, built upon the fully convolutional network, has proven to be effective in biomedical image segmentation. However, U-Net applies skip connections to merge semantically different low- and high-level convolutional features, resulting in not only blurred feature maps, but also over- and under-segmented target regions. To address these limitations, we propose a simple, yet effective end-to-end depthwise encoder-decoder fully convolutional network architecture, called Sharp U-Net, for binary and multi-class biomedical image segmentation. The key rationale of Sharp U-Net is that instead of applying a plain skip connection, a depthwise convolution of the encoder feature map with a sharpening kernel filter is employed prior to merging the encoder and decoder features, thereby producing a sharpened intermediate feature map of the same size as the encoder map. Using this sharpening filter layer, we are able to not only fuse semantically less dissimilar features, but also to smooth out artifacts throughout the network layers during the early stages of training. Our extensive experiments on six datasets show that the proposed Sharp U-Net model consistently outperforms or matches the recent state-of-the-art baselines in both binary and multi-class segmentation tasks, while adding no extra learnable parameters. Furthermore, Sharp U-Net outperforms baselines that have more than three times the number of learnable parameters.