ARCVDCIVSPDec 24, 2021

Fast and Scalable Computation of the Forward and Inverse Discrete Periodic Radon Transform

arXiv:2112.13149v121 citations
Originality Incremental advance
AI Analysis

This work addresses the need for faster image reconstruction in applications like medical imaging or tomography, though it is incremental as it builds on existing DPRT methods with hardware optimizations.

The paper tackles the problem of efficiently computing the forward and inverse Discrete Periodic Radon Transform (DPRT) for image reconstruction by proposing a fast and scalable hardware architecture using parallel adder trees, circular shift registers, and image block-based methods, resulting in up to N^2 additions per clock cycle and a 36 times faster computation for a 251x251 image compared to previous systolic implementations.

The Discrete Periodic Radon Transform (DPRT) has been extensively used in applications that involve image reconstructions from projections. This manuscript introduces a fast and scalable approach for computing the forward and inverse DPRT that is based on the use of: (i) a parallel array of fixed-point adder trees, (ii) circular shift registers to remove the need for accessing external memory components when selecting the input data for the adder trees, (iii) an image block-based approach to DPRT computation that can fit the proposed architecture to available resources, and (iv) fast transpositions that are computed in one or a few clock cycles that do not depend on the size of the input image. As a result, for an $N\times N$ image ($N$ prime), the proposed approach can compute up to $N^{2}$ additions per clock cycle. Compared to previous approaches, the scalable approach provides the fastest known implementations for different amounts of computational resources. For example, for a $251\times 251$ image, for approximately $25\%$ fewer flip-flops than required for a systolic implementation, we have that the scalable DPRT is computed 36 times faster. For the fastest case, we introduce optimized architectures that can compute the DPRT and its inverse in just $2N+\left\lceil \log_{2}N\right\rceil+1$ and $2N+3\left\lceil \log_{2}N\right\rceil+B+2$ cycles respectively, where $B$ is the number of bits used to represent each input pixel. On the other hand, the scalable DPRT approach requires more 1-bit additions than for the systolic implementation and provides a trade-off between speed and additional 1-bit additions. All of the proposed DPRT architectures were implemented in VHDL and validated using an FPGA implementation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes