Implicit Integration of Superpixel Segmentation into Fully Convolutional Networks
This addresses the problem of efficiently combining superpixels with CNNs for computer vision researchers, offering an incremental improvement by simplifying integration without altering feed-forward paths.
The paper tackles the challenge of integrating superpixels with convolutional neural networks (CNNs) in an end-to-end fashion without extra models or graph convolutions, resulting in a method that speeds up architectures and improves prediction accuracy in tasks like semantic segmentation and depth estimation.
Superpixels are a useful representation to reduce the complexity of image data. However, to combine superpixels with convolutional neural networks (CNNs) in an end-to-end fashion, one requires extra models to generate superpixels and special operations such as graph convolution. In this paper, we propose a way to implicitly integrate a superpixel scheme into CNNs, which makes it easy to use superpixels with CNNs in an end-to-end fashion. Our proposed method hierarchically groups pixels at downsampling layers and generates superpixels. Our method can be plugged into many existing architectures without a change in their feed-forward path because our method does not use superpixels in the feed-forward path but use them to recover the lost resolution instead of bilinear upsampling. As a result, our method preserves detailed information such as object boundaries in the form of superpixels even when the model contains downsampling layers. We evaluate our method on several tasks such as semantic segmentation, superpixel segmentation, and monocular depth estimation, and confirm that it speeds up modern architectures and/or improves their prediction accuracy in these tasks.