Kernel Transformer Networks for Compact Spherical Convolution
This addresses computational and accuracy issues in spherical image processing for computer vision applications, offering an incremental improvement with transferability benefits.
The paper tackles the problem of transferring convolutional neural networks (CNNs) from perspective to spherical 360° imagery, presenting Kernel Transformer Networks (KTNs) that efficiently transfer kernels while preserving accuracy and enabling model transfer across tasks without retraining.
Ideally, 360° imagery could inherit the deep convolutional neural networks (CNNs) already trained with great success on perspective projection images. However, existing methods to transfer CNNs from perspective to spherical images introduce significant computational costs and/or degradations in accuracy. In this work, we present the Kernel Transformer Network (KTN). KTNs efficiently transfer convolution kernels from perspective images to the equirectangular projection of 360° images. Given a source CNN for perspective images as input, the KTN produces a function parameterized by a polar angle and kernel as output. Given a novel 360° image, that function in turn can compute convolutions for arbitrary layers and kernels as would the source CNN on the corresponding tangent plane projections. Distinct from all existing methods, KTNs allow model transfer: the same model can be applied to different source CNNs with the same base architecture. This enables application to multiple recognition tasks without re-training the KTN. Validating our approach with multiple source CNNs and datasets, we show that KTNs improve the state of the art for spherical convolution. KTNs successfully preserve the source CNN's accuracy, while offering transferability, scalability to typical image resolutions, and, in many cases, a substantially lower memory footprint.