Hidden costs for inference with deep network on embedded system devices
This work addresses the challenge of optimizing deep learning for real-time performance on embedded devices, but it is incremental as it highlights overlooked computational aspects without introducing a new method.
The study tackled the problem of accurately estimating inference time for deep learning models on embedded systems by showing that the commonly used Multiply-Accumulate metric is insufficient, as experiments with ten models on CIFAR-100 revealed discrepancies between theoretical calculations and actual inference times.
This study evaluates the inference performance of various deep learning models under an embedded system environment. In previous works, Multiply-Accumulate operation is typically used to measure computational load of a deep model. According to this study, however, this metric has a limitation to estimate inference time on embedded devices. This paper poses the question of what aspects are overlooked when expressed in terms of Multiply-Accumulate operations. In experiments, an image classification task is performed on an embedded system device using the CIFAR-100 dataset to compare and analyze the inference times of ten deep models with the theoretically calculated Multiply-Accumulate operations for each model. The results highlight the importance of considering additional computations between tensors when optimizing deep learning models for real-time performing in embedded systems.