CV DCMar 23, 2018

Face Recognition with Hybrid Efficient Convolution Algorithms on FPGAs

Chuanhao Zhuge, Xinheng Liu, Xiaofan Zhang, Sudeep Gummadi, Jinjun Xiong, Deming Chen

arXiv:1803.09004v17.337 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of low-latency deployment of deep CNNs for face recognition on FPGAs, offering incremental improvements in acceleration techniques.

The paper tackled the challenge of deploying deep CNNs for latency-critical tasks like face recognition by exploring hybrid fast convolution algorithms (Winograd and FFT) and optimizing parallelism for novel architectures such as Inception modules. The result was a configurable FPGA-based system that achieved a 3.75x latency speedup compared to a high-end NVIDIA GPU and significantly surpassed previous FPGA results.

Deep Convolutional Neural Networks have become a Swiss knife in solving critical artificial intelligence tasks. However, deploying deep CNN models for latency-critical tasks remains to be challenging because of the complex nature of CNNs. Recently, FPGA has become a favorable device to accelerate deep CNNs thanks to its high parallel processing capability and energy efficiency. In this work, we explore different fast convolution algorithms including Winograd and Fast Fourier Transform (FFT), and find an optimal strategy to apply them together on different types of convolutions. We also propose an optimization scheme to exploit parallelism on novel CNN architectures such as Inception modules in GoogLeNet. We implement a configurable IP-based face recognition acceleration system based on FaceNet using High-Level Synthesis. Our implementation on a Xilinx Ultrascale device achieves 3.75x latency speedup compared to a high-end NVIDIA GPU and surpasses previous FPGA results significantly.

View on arXiv PDF

Similar