LiftPool: Bidirectional ConvNet Pooling
This work addresses a fundamental limitation in CNN architectures for computer vision tasks, offering a novel pooling method that enhances performance and robustness, though it is incremental in the context of pooling techniques.
The authors tackled the problem of information loss and irreversibility in conventional pooling operations in convolutional neural networks by proposing LiftPool, a bidirectional pooling layer based on the Lifting Scheme, which achieved improved results in image classification and semantic segmentation across various backbones and offered better robustness to input corruptions.
Pooling is a critical operation in convolutional neural networks for increasing receptive fields and improving robustness to input variations. Most existing pooling operations downsample the feature maps, which is a lossy process. Moreover, they are not invertible: upsampling a downscaled feature map can not recover the lost information in the downsampling. By adopting the philosophy of the classical Lifting Scheme from signal processing, we propose LiftPool for bidirectional pooling layers, including LiftDownPool and LiftUpPool. LiftDownPool decomposes a feature map into various downsized sub-bands, each of which contains information with different frequencies. As the pooling function in LiftDownPool is perfectly invertible, by performing LiftDownPool backward, a corresponding up-pooling layer LiftUpPool is able to generate a refined upsampled feature map using the detail sub-bands, which is useful for image-to-image translation challenges. Experiments show the proposed methods achieve better results on image classification and semantic segmentation, using various backbones. Moreover, LiftDownPool offers better robustness to input corruptions and perturbations.