ARCVDCIVSPDec 24, 2021

Fast 2D Convolutions and Cross-Correlations Using Scalable Architectures

arXiv:2112.13150v11 citations
Originality Incremental advance
AI Analysis

This work addresses the need for faster convolution computations in hardware implementations, though it appears incremental as it builds on existing transform-based methods.

The paper tackles the problem of accelerating 2D convolutions and cross-correlations by mapping them to 1D operations in the transform domain using scalable architectures, achieving computation in O(P) to O(P^2) clock cycles and outperforming current methods on FPGA and Zynq-SOC devices.

The manuscript describes fast and scalable architectures and associated algorithms for computing convolutions and cross-correlations. The basic idea is to map 2D convolutions and cross-correlations to a collection of 1D convolutions and cross-correlations in the transform domain. This is accomplished through the use of the Discrete Periodic Radon Transform (DPRT) for general kernels and the use of SVD-LU decompositions for low-rank kernels. The approach uses scalable architectures that can be fitted into modern FPGA and Zynq-SOC devices. Based on different types of available resources, for $P\times P$ blocks, 2D convolutions and cross-correlations can be computed in just $O(P)$ clock cycles up to $O(P^2)$ clock cycles. Thus, there is a trade-off between performance and required numbers and types of resources. We provide implementations of the proposed architectures using modern programmable devices (Virtex-7 and Zynq-SOC). Based on the amounts and types of required resources, we show that the proposed approaches significantly outperform current methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes