AR CV DC IV SPDec 24, 2021

Fast 2D Convolutions and Cross-Correlations Using Scalable Architectures

Cesar Carranza, Daniel Llamocca, Marios Pattichis

arXiv:2112.13150v12.31 citations

Originality Incremental advance

AI Analysis

This work addresses the need for faster convolution computations in hardware implementations, though it appears incremental as it builds on existing transform-based methods.

The paper tackles the problem of accelerating 2D convolutions and cross-correlations by mapping them to 1D operations in the transform domain using scalable architectures, achieving computation in O(P) to O(P^2) clock cycles and outperforming current methods on FPGA and Zynq-SOC devices.

The manuscript describes fast and scalable architectures and associated algorithms for computing convolutions and cross-correlations. The basic idea is to map 2D convolutions and cross-correlations to a collection of 1D convolutions and cross-correlations in the transform domain. This is accomplished through the use of the Discrete Periodic Radon Transform (DPRT) for general kernels and the use of SVD-LU decompositions for low-rank kernels. The approach uses scalable architectures that can be fitted into modern FPGA and Zynq-SOC devices. Based on different types of available resources, for $P\times P$ blocks, 2D convolutions and cross-correlations can be computed in just $O(P)$ clock cycles up to $O(P^2)$ clock cycles. Thus, there is a trade-off between performance and required numbers and types of resources. We provide implementations of the proposed architectures using modern programmable devices (Virtex-7 and Zynq-SOC). Based on the amounts and types of required resources, we show that the proposed approaches significantly outperform current methods.

View on arXiv PDF

Similar