LG ARMay 7, 2021

ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked Models

Matthias Wess, Matvey Ivanov, Anvesh Nookala, Christoph Unger, Alexander Wendt, Axel Jantsch

arXiv:2105.03176v16.511 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses latency estimation challenges for DNN deployment on accelerators, offering a tool to optimize design space exploration, though it is incremental as it builds on existing estimation methods.

The paper tackles the problem of estimating inference latency for DNNs on hardware accelerators by proposing ANNETTE, a framework that uses stacked models from benchmarks, achieving average errors of 3.47% on DNNDK and 7.44% on NCS2 across 12 networks.

With new accelerator hardware for DNN, the computing power for AI applications has increased rapidly. However, as DNN algorithms become more complex and optimized for specific applications, latency requirements remain challenging, and it is critical to find the optimal points in the design space. To decouple the architectural search from the target hardware, we propose a time estimation framework that allows for modeling the inference latency of DNNs on hardware accelerators based on mapping and layer-wise estimation models. The proposed methodology extracts a set of models from micro-kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation. We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation. We test the mixed models on the ZCU102 SoC board with DNNDK and Intel Neural Compute Stick 2 on a set of 12 state-of-the-art neural networks. It shows an average estimation error of 3.47% for the DNNDK and 7.44% for the NCS2, outperforming the statistical and analytical layer models for almost all selected networks. For a randomly selected subset of 34 networks of the NASBench dataset, the mixed model reaches fidelity of 0.988 in Spearman's rank correlation coefficient metric. The code of ANNETTE is publicly available at https://github.com/embedded-machine-learning/annette.

View on arXiv PDF Code

Similar