Parallel Grid Pooling for Data Augmentation
This addresses a limitation in CNN architectures for computer vision tasks, offering an incremental improvement by enhancing feature utilization without discarding data.
The paper tackles the problem of downsampling layers in CNNs discarding intermediate features, proposing parallel grid pooling (PGP) as a novel layer that performs downsampling without feature loss, acting as data augmentation. Experimental results on image classification benchmarks show its effectiveness, with code provided for verification.
Convolutional neural network (CNN) architectures utilize downsampling layers, which restrict the subsequent layers to learn spatially invariant features while reducing computational costs. However, such a downsampling operation makes it impossible to use the full spectrum of input features. Motivated by this observation, we propose a novel layer called parallel grid pooling (PGP) which is applicable to various CNN models. PGP performs downsampling without discarding any intermediate feature. It works as data augmentation and is complementary to commonly used data augmentation techniques. Furthermore, we demonstrate that a dilated convolution can naturally be represented using PGP operations, which suggests that the dilated convolution can also be regarded as a type of data augmentation technique. Experimental results based on popular image classification benchmarks demonstrate the effectiveness of the proposed method. Code is available at: https://github.com/akitotakeki