DCNEJun 14, 2016

A Systematic Approach to Blocking Convolutional Neural Networks

arXiv:1606.04209v155 citations
Originality Incremental advance
AI Analysis

This work addresses performance bottlenecks in CNN implementations for computer vision applications, offering incremental improvements in efficiency for hardware and software systems.

The paper tackles the problem of optimizing convolutional neural networks (CNNs) for memory locality by developing an analytical model to automatically derive blockings, resulting in up to an order of magnitude improvement in energy efficiency for custom hardware and up to 90% reduction in memory accesses for x86 CPU implementations compared to hand-optimized methods.

Convolutional Neural Networks (CNNs) are the state of the art solution for many computer vision problems, and many researchers have explored optimized implementations. Most implementations heuristically block the computation to deal with the large data sizes and high data reuse of CNNs. This paper explores how to block CNN computations for memory locality by creating an analytical model for CNN-like loop nests. Using this model we automatically derive optimized blockings for common networks that improve the energy efficiency of custom hardware implementations by up to an order of magnitude. Compared to traditional CNN CPU implementations based on highly-tuned, hand-optimized BLAS libraries,our x86 programs implementing the optimal blocking reduce the number of memory accesses by up to 90%.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes