CVMay 27, 2020

GSTO: Gated Scale-Transfer Operation for Multi-Scale Feature Learning in Pixel Labeling

Zhuoying Wang, Yongtao Wang, Zhi Tang, Yangyan Li, Ying Chen, Haibin Ling, Weisi Lin

arXiv:2005.13363v21.2Has Code

Originality Incremental advance

AI Analysis

This addresses performance bottlenecks in pixel labeling tasks like pose estimation and segmentation for computer vision applications, offering a plug-and-play solution with incremental improvements.

The paper tackled the problem of scale-confused features in pixel labeling by proposing the Gated Scale-Transfer Operation (GSTO), which improves multi-scale feature learning and achieves state-of-the-art results on benchmarks like COCO for human pose estimation and Cityscapes for semantic segmentation with negligible extra cost.

Existing CNN-based methods for pixel labeling heavily depend on multi-scale features to meet the requirements of both semantic comprehension and detail preservation. State-of-the-art pixel labeling neural networks widely exploit conventional scale-transfer operations, i.e., up-sampling and down-sampling to learn multi-scale features. In this work, we find that these operations lead to scale-confused features and suboptimal performance because they are spatial-invariant and directly transit all feature information cross scales without spatial selection. To address this issue, we propose the Gated Scale-Transfer Operation (GSTO) to properly transit spatial-filtered features to another scale. Specifically, GSTO can work either with or without extra supervision. Unsupervised GSTO is learned from the feature itself while the supervised one is guided by the supervised probability matrix. Both forms of GSTO are lightweight and plug-and-play, which can be flexibly integrated into networks or modules for learning better multi-scale features. In particular, by plugging GSTO into HRNet, we get a more powerful backbone (namely GSTO-HRNet) for pixel labeling, and it achieves new state-of-the-art results on the COCO benchmark for human pose estimation and other benchmarks for semantic segmentation including Cityscapes, LIP and Pascal Context, with negligible extra computational cost. Moreover, experiment results demonstrate that GSTO can also significantly boost the performance of multi-scale feature aggregation modules like PPM and ASPP. Code will be made available at https://github.com/VDIGPKU/GSTO.

View on arXiv PDF Code

Similar