CV AR LGNov 23, 2017

fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs

Stylianos I. Venieris, Christos-Savvas Bouganis

arXiv:1711.08740v110.435 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of efficiently deploying ConvNets on embedded systems for applications like video surveillance and autonomous cars, offering a domain-specific toolflow that is incremental in automating FPGA mapping.

The authors tackled the challenge of mapping diverse convolutional neural networks onto embedded FPGAs by developing fpgaConvNet, an automated end-to-end framework that optimizes for throughput, latency, or multiobjective criteria, resulting in designs that improve performance by up to 6.65x over optimized embedded GPUs under the same power constraints.

In recent years, Convolutional Neural Networks (ConvNets) have become an enabling technology for a wide range of novel embedded Artificial Intelligence systems. Across the range of applications, the performance needs vary significantly, from high-throughput video surveillance to the very low-latency requirements of autonomous cars. In this context, FPGAs can provide a potential platform that can be optimally configured based on the different performance needs. However, the complexity of ConvNet models keeps increasing making their mapping to an FPGA device a challenging task. This work presents fpgaConvNet, an end-to-end framework for mapping ConvNets on FPGAs. The proposed framework employs an automated design methodology based on the Synchronous Dataflow (SDF) paradigm and defines a set of SDF transformations in order to efficiently explore the architectural design space. By selectively optimising for throughput, latency or multiobjective criteria, the presented tool is able to efficiently explore the design space and generate hardware designs from high-level ConvNet specifications, explicitly optimised for the performance metric of interest. Overall, our framework yields designs that improve the performance by up to 6.65x over highly optimised embedded GPU designs for the same power constraints in embedded environments.

View on arXiv PDF

Similar