Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization
This addresses the problem of constrained computation and storage for mobile DNN applications, but it appears incremental as it builds on existing pruning and optimization methods.
The paper tackled the challenge of real-time DNN inference on mobile platforms by proposing structured model pruning and compiler optimization techniques, achieving real-time execution for applications like style transfer and super resolution.
High-end mobile platforms rapidly serve as primary computing devices for a wide range of Deep Neural Network (DNN) applications. However, the constrained computation and storage resources on these devices still pose significant challenges for real-time DNN inference executions. To address this problem, we propose a set of hardware-friendly structured model pruning and compiler optimization techniques to accelerate DNN executions on mobile devices. This demo shows that these optimizations can enable real-time mobile execution of multiple DNN applications, including style transfer, DNN coloring and super resolution.