CVLGDec 11, 2020

Cyclic orthogonal convolutions for long-range integration of features

arXiv:2012.06462v1
AI Analysis

This work addresses the problem of limited long-range feature integration in CNNs for computer vision researchers, offering an incremental improvement in architectural design.

The paper introduces CycleNet, a novel architecture that enables flexible information flow across entire images with fewer layers than traditional CNNs by using a cycle of three orthogonal convolutions across spatial and feature dimensions. CycleNet achieves competitive image classification results on CIFAR-10 and ImageNet, and significantly outperforms CNNs on the Pathfinder challenge, which requires long-range feature integration.

In Convolutional Neural Networks (CNNs) information flows across a small neighbourhood of each pixel of an image, preventing long-range integration of features before reaching deep layers in the network. We propose a novel architecture that allows flexible information flow between features $z$ and locations $(x,y)$ across the entire image with a small number of layers. This architecture uses a cycle of three orthogonal convolutions, not only in $(x,y)$ coordinates, but also in $(x,z)$ and $(y,z)$ coordinates. We stack a sequence of such cycles to obtain our deep network, named CycleNet. As this only requires a permutation of the axes of a standard convolution, its performance can be directly compared to a CNN. Our model obtains competitive results at image classification on CIFAR-10 and ImageNet datasets, when compared to CNNs of similar size. We hypothesise that long-range integration favours recognition of objects by shape rather than texture, and we show that CycleNet transfers better than CNNs to stylised images. On the Pathfinder challenge, where integration of distant features is crucial, CycleNet outperforms CNNs by a large margin. We also show that even when employing a small convolutional kernel, the size of receptive fields of CycleNet reaches its maximum after one cycle, while conventional CNNs require a large number of layers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes