LGARMLSep 18, 2020

GrateTile: Efficient Sparse Tensor Tiling for CNN Processing

arXiv:2009.08685v1
Originality Highly original
AI Analysis

This addresses memory bottlenecks for CNN processing in hardware accelerators, offering a practical improvement.

The paper tackles the problem of inefficient memory bandwidth usage in CNN accelerators by proposing GrateTile, a sparse tensor tiling scheme that reduces DRAM bandwidth by an average of 55% while using only 0.6% of feature map size for indexing.

We propose GrateTile, an efficient, hardwarefriendly data storage scheme for sparse CNN feature maps (activations). It divides data into uneven-sized subtensors and, with small indexing overhead, stores them in a compressed yet randomly accessible format. This design enables modern CNN accelerators to fetch and decompressed sub-tensors on-the-fly in a tiled processing manner. GrateTile is suitable for architectures that favor aligned, coalesced data access, and only requires minimal changes to the overall architectural design. We simulate GrateTile with state-of-the-art CNNs and show an average of 55% DRAM bandwidth reduction while using only 0.6% of feature map size for indexing storage.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes