CVJun 13, 2020

Split-Merge Pooling

arXiv:2006.07742v1
Originality Incremental advance
AI Analysis

This addresses the problem of spatial information loss in semantic segmentation for computer vision applications, offering a novel method that is incremental over existing pooling techniques.

The paper tackles the loss of spatial information in dense prediction tasks like semantic segmentation by introducing Split-Merge pooling, which fully preserves spatial resolution without subsampling while achieving a large receptive field. Results show significant accuracy improvements on Cityscapes and GTA-5 datasets when replacing max-pooling and striding convolutions in ResNet variants.

There are a variety of approaches to obtain a vast receptive field with convolutional neural networks (CNNs), such as pooling or striding convolutions. Most of these approaches were initially designed for image classification and later adapted to dense prediction tasks, such as semantic segmentation. However, the major drawback of this adaptation is the loss of spatial information. Even the popular dilated convolution approach, which in theory is able to operate with full spatial resolution, needs to subsample features for large image sizes in order to make the training and inference tractable. In this work, we introduce Split-Merge pooling to fully preserve the spatial information without any subsampling. By applying Split-Merge pooling to deep networks, we achieve, at the same time, a very large receptive field. We evaluate our approach for dense semantic segmentation of large image sizes taken from the Cityscapes and GTA-5 datasets. We demonstrate that by replacing max-pooling and striding convolutions with our split-merge pooling, we are able to improve the accuracy of different variations of ResNet significantly.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes