CVLGMLSep 27, 2018

Compressing the Input for CNNs with the First-Order Scattering Transform

arXiv:1809.10200v128 citations
Originality Incremental advance
AI Analysis

This addresses efficiency issues in computer vision for practitioners by offering an incremental improvement in model compression and speed.

The paper tackles the problem of reducing input size for CNNs to improve efficiency, showing that using a first-order scattering transform compresses images while preserving classification accuracy comparable to ResNet-50 on ImageNet, and leads to faster inference and lower memory usage in detection tasks on Pascal VOC and COCO.

We study the first-order scattering transform as a candidate for reducing the signal processed by a convolutional neural network (CNN). We show theoretical and empirical evidence that in the case of natural images and sufficiently small translation invariance, this transform preserves most of the signal information needed for classification while substantially reducing the spatial resolution and total signal size. We demonstrate that cascading a CNN with this representation performs on par with ImageNet classification models, commonly used in downstream tasks, such as the ResNet-50. We subsequently apply our trained hybrid ImageNet model as a base model on a detection system, which has typically larger image inputs. On Pascal VOC and COCO detection tasks we demonstrate improvements in the inference speed and training memory consumption compared to models trained directly on the input image.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes