ARDCLGJul 10, 2025

Accelerating Transposed Convolutions on FPGA-based Edge Devices

arXiv:2507.07683v1h-index: 3
Originality Incremental advance
AI Analysis

This addresses performance bottlenecks for generative models on resource-constrained edge devices, representing an incremental improvement over existing accelerators.

The paper tackled the inefficiency of Transposed Convolutions (TCONV) in generative AI models on edge devices by proposing MM2IM, a hardware-software co-designed accelerator, achieving average speedups of 1.9x to 4.2x and energy reductions up to 2.4x compared to CPU baselines.

Transposed Convolutions (TCONV) enable the up-scaling mechanism within generative Artificial Intelligence (AI) models. However, the predominant Input-Oriented Mapping (IOM) method for implementing TCONV has complex output mapping, overlapping sums, and ineffectual computations. These inefficiencies further exacerbate the performance bottleneck of TCONV and generative models on resource-constrained edge devices. To address this problem, in this paper we propose MM2IM, a hardware-software co-designed accelerator that combines Matrix Multiplication (MatMul) with col2IM to process TCONV layers on resource-constrained edge devices efficiently. Using the SECDA-TFLite design toolkit, we implement MM2IM and evaluate its performance across 261 TCONV problem configurations, achieving an average speedup of 1.9x against a dual-thread ARM Neon optimized CPU baseline. We then evaluate the performance of MM2IM on a range of TCONV layers from well-known generative models achieving up to 4.2x speedup, and compare it against similar resource-constrained TCONV accelerators, outperforming them by at least 2x GOPs/DSP. Finally, we evaluate MM2IM on the DCGAN and pix2pix GAN models, achieving up to 3x speedup and 2.4x energy reduction against the CPU baseline.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes