ARLGMay 2, 2022

Efficient Accelerator for Dilated and Transposed Convolution with Decomposition

arXiv:2205.02103v116 citationsh-index: 30
Originality Synthesis-oriented
AI Analysis

This work addresses a domain-specific problem for hardware designers and AI practitioners by providing an incremental improvement to accelerate real-time tasks like segmentation.

The paper tackles the inefficiency of hardware accelerators for dilated and transposed convolutions by proposing a decomposition method that reduces redundant computations, achieving an 87.8% reduction in cycle counts and an 8.2x speedup over naive execution for the ENet case.

Hardware acceleration for dilated and transposed convolution enables real time execution of related tasks like segmentation, but current designs are specific for these convolutional types or suffer from complex control for reconfigurable designs. This paper presents a design that decomposes input or weight for dilated and transposed convolutions respectively to skip redundant computations and thus executes efficiently on existing dense CNN hardware as well. The proposed architecture can cut down 87.8\% of the cycle counts to achieve 8.2X speedup over a naive execution for the ENet case.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes