QM CVJun 17, 2014

Replicating Kernels with a Short Stride Allows Sparse Reconstructions with Fewer Independent Kernels

Peter F. Schultz, Dylan M. Paiton, Wei Lu, Garrett T. Kenyon

arXiv:1406.4205v127 citations

Originality Incremental advance

AI Analysis

This work addresses efficiency in sparse coding for computer vision researchers, offering a method to reduce computational costs while maintaining performance, though it appears incremental as it builds on existing deconvolutional neural network frameworks.

The paper tackles the problem of reducing the number of independent kernels needed for sparse reconstructions in deconvolutional neural networks by investigating the relationship between stride, kernel count, and reconstruction quality. It finds that using a short stride (e.g., 2) with only eight kernels can achieve comparable reconstruction quality to using 512 kernels with a nonoverlapping stride (e.g., 16) for 16x16-pixel receptive fields.

In sparse coding it is common to tile an image into nonoverlapping patches, and then use a dictionary to create a sparse representation of each tile independently. In this situation, the overcompleteness of the dictionary is the number of dictionary elements divided by the patch size. In deconvolutional neural networks (DCNs), dictionaries learned on nonoverlapping tiles are replaced by a family of convolution kernels. Hence adjacent points in the feature maps (V1 layers) have receptive fields in the image that are translations of each other. The translational distance is determined by the dimensions of V1 in comparison to the dimensions of the image space. We refer to this translational distance as the stride. We implement a type of DCN using a modified Locally Competitive Algorithm (LCA) to investigate the relationship between the number of kernels, the stride, the receptive field size, and the quality of reconstruction. We find, for example, that for 16x16-pixel receptive fields, using eight kernels and a stride of 2 leads to sparse reconstructions of comparable quality as using 512 kernels and a stride of 16 (the nonoverlapping case). We also find that for a given stride and number of kernels, the patch size does not significantly affect reconstruction quality. Instead, the learned convolution kernels have a natural support radius independent of the patch size.

View on arXiv PDF

Similar