CVAILGMay 22, 2018

CascadeCNN: Pushing the performance limits of quantisation

arXiv:1805.08743v125 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient high-throughput inference in CNN applications, particularly for resource-constrained devices like FPGAs, though it is incremental as it builds on existing quantization and cascade methods.

The authors tackled the problem of improving CNN inference throughput by exploiting the trade-off between computation time and accuracy, achieving performance boosts of up to 55% for VGG-16 and 48% for AlexNet over baseline designs without retraining.

This work presents CascadeCNN, an automated toolflow that pushes the quantisation limits of any given CNN model, to perform high-throughput inference by exploiting the computation time-accuracy trade-off. Without the need for retraining, a two-stage architecture tailored for any given FPGA device is generated, consisting of a low- and a high-precision unit. A confidence evaluation unit is employed between them to identify misclassified cases at run time and forward them to the high-precision unit or terminate computation. Experiments demonstrate that CascadeCNN achieves a performance boost of up to 55% for VGG-16 and 48% for AlexNet over the baseline design for the same resource budget and accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes