Tree-structured Kronecker Convolutional Network for Semantic Segmentation
This work addresses semantic segmentation for computer vision applications, offering incremental improvements through novel architectural components.
The paper tackles the problem of semantic segmentation by addressing how atrous convolution neglects partial information, proposing a Kronecker convolution to capture partial features and enlarge receptive fields without extra parameters, and a Tree-structured Feature Aggregation module for multi-scale learning, achieving competitive results on datasets like PASCAL VOC 2012, PASCAL-Context, and Cityscapes.
Most existing semantic segmentation methods employ atrous convolution to enlarge the receptive field of filters, but neglect partial information. To tackle this issue, we firstly propose a novel Kronecker convolution which adopts Kronecker product to expand the standard convolutional kernel for taking into account the partial feature neglected by atrous convolutions. Therefore, it can capture partial information and enlarge the receptive field of filters simultaneously without introducing extra parameters. Secondly, we propose Tree-structured Feature Aggregation (TFA) module which follows a recursive rule to expand and forms a hierarchical structure. Thus, it can naturally learn representations of multi-scale objects and encode hierarchical contextual information in complex scenes. Finally, we design Tree-structured Kronecker Convolutional Networks (TKCN) which employs Kronecker convolution and TFA module. Extensive experiments on three datasets, PASCAL VOC 2012, PASCAL-Context and Cityscapes, verify the effectiveness of our proposed approach. We make the code and the trained model publicly available at https://github.com/wutianyiRosun/TKCN.