AIDec 1, 2024

AdaScale: Dynamic Context-aware DNN Scaling via Automated Adaptation Loop on Mobile Devices

Yuzhan Wang, Sicong Liu, Bin Guo, Boqi Zhang, Ke Ma, Yasan Ding, Hao Luo, Yao Li, Zhiwen Yu

arXiv:2412.00724v14.28 citationsh-index: 9IEEE Internet of Things Journal

Originality Incremental advance

AI Analysis

This addresses the problem of labor-intensive and inefficient model adaptation for mobile devices with varying resource constraints, offering an incremental improvement over existing compression and scaling techniques.

The paper tackles the challenge of adapting deep neural networks to dynamic and diverse mobile deployment contexts by introducing AdaScale, an elastic inference framework that automates model adaptation, resulting in significant improvements including a 5.09% accuracy increase, 66.89% training overhead reduction, 1.51 to 6.2 times faster inference latency, and 4.69 times lower energy costs.

Deep learning is reshaping mobile applications, with a growing trend of deploying deep neural networks (DNNs) directly to mobile and embedded devices to address real-time performance and privacy. To accommodate local resource limitations, techniques like weight compression, convolution decomposition, and specialized layer architectures have been developed. However, the \textit{dynamic} and \textit{diverse} deployment contexts of mobile devices pose significant challenges. Adapting deep models to meet varied device-specific requirements for latency, accuracy, memory, and energy is labor-intensive. Additionally, changing processor states, fluctuating memory availability, and competing processes frequently necessitate model re-compression to preserve user experience. To address these issues, we introduce AdaScale, an elastic inference framework that automates the adaptation of deep models to dynamic contexts. AdaScale leverages a self-evolutionary model to streamline network creation, employs diverse compression operator combinations to reduce the search space and improve outcomes, and integrates a resource availability awareness block and performance profilers to establish an automated adaptation loop. Our experiments demonstrate that AdaScale significantly enhances accuracy by 5.09%, reduces training overhead by 66.89%, speeds up inference latency by 1.51 to 6.2 times, and lowers energy costs by 4.69 times.

View on arXiv PDF

Similar