CVLGMLJun 15, 2019

Visual Context-aware Convolution Filters for Transformation-invariant Neural Network

arXiv:1906.09986v1
AI Analysis

This work addresses transformation invariance in computer vision for tasks like image recognition, offering a plugin framework to enhance CNN architectures, though it is incremental as it builds on existing CNN and multi-instance learning techniques.

The paper tackled the problem of transformation invariance in CNNs by proposing a visual context-aware filter generation module that incorporates contextual information into convolution filters, achieving error rates of 1.13% on MNIST-rot-12k, 1.12% on Half-rotated MNIST, and 0.68% on Scaling MNIST, which are significantly better than state-of-the-art results.

We propose a novel visual context-aware filter generation module which incorporates contextual information present in images into Convolutional Neural Networks (CNNs). In contrast to traditional CNNs, we do not employ the same set of learned convolution filters for all input image instances. Our proposed input-conditioned convolution filters when combined with techniques inspired by Multi-instance learning and max-pooling, results in a transformation-invariant neural network. We investigated the performance of our proposed framework on three MNIST variations, which covers both rotation and scaling variance, and achieved 1.13% error on MNIST-rot-12k, 1.12% error on Half-rotated MNIST and 0.68% error on Scaling MNIST, which is significantly better than the state-of-the-art results. We make use of visualization to further prove the effectiveness of our visual context-aware convolution filters. Our proposed visual context-aware convolution filter generation framework can also serve as a plugin for any CNN based architecture and enhance its modeling capacity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes