Hardware-Efficient Deconvolution-Based GAN for Edge Computing
This work addresses the problem of deploying GANs on edge devices with limited resources, offering a practical solution for real-time applications, though it is incremental as it builds on existing GAN and hardware optimization techniques.
The paper tackled the high computational and memory costs of GANs by proposing a hardware-efficient, quantized deconvolution GAN (QDCGAN) implemented on FPGA for edge computing, achieving a higher throughput versus resource utilization trade-off and enabling low-power inference on resource-constrained platforms.
Generative Adversarial Networks (GAN) are cutting-edge algorithms for generating new data samples based on the learned data distribution. However, its performance comes at a significant cost in terms of computation and memory requirements. In this paper, we proposed an HW/SW co-design approach for training quantized deconvolution GAN (QDCGAN) implemented on FPGA using a scalable streaming dataflow architecture capable of achieving higher throughput versus resource utilization trade-off. The developed accelerator is based on an efficient deconvolution engine that offers high parallelism with respect to scaling factors for GAN-based edge computing. Furthermore, various precisions, datasets, and network scalability were analyzed for low-power inference on resource-constrained platforms. Lastly, an end-to-end open-source framework is provided for training, implementation, state-space exploration, and scaling the inference using Vivado high-level synthesis for Xilinx SoC-FPGAs, and a comparison testbed with Jetson Nano.