LG ARMay 11, 2021

3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low BitwidthQuantization, and Ultra-Low Latency Acceleration

Yao Chen, Cole Hawkins, Kaiqi Zhang, Zheng Zhang, Cong Hao

arXiv:2105.06250v15.510 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of efficient AI deployment on resource-constrained edge devices, presenting incremental improvements in training, quantization, and acceleration.

The paper tackles the challenge of running deep neural networks on edge devices with limited memory, computing, and power by proposing a rank-adaptive tensorized model for ultra-low memory training, an ultra-low bitwidth quantization method for state-of-the-art compression, and an ultra-low latency accelerator design.

The deep neural network (DNN) based AI applications on the edge require both low-cost computing platforms and high-quality services. However, the limited memory, computing resources, and power budget of the edge devices constrain the effectiveness of the DNN algorithms. Developing edge-oriented AI algorithms and implementations (e.g., accelerators) is challenging. In this paper, we summarize our recent efforts for efficient on-device AI development from three aspects, including both training and inference. First, we present on-device training with ultra-low memory usage. We propose a novel rank-adaptive tensor-based tensorized neural network model, which offers orders-of-magnitude memory reduction during training. Second, we introduce an ultra-low bitwidth quantization method for DNN model compression, achieving the state-of-the-art accuracy under the same compression ratio. Third, we introduce an ultra-low latency DNN accelerator design, practicing the software/hardware co-design methodology. This paper emphasizes the importance and efficacy of training, quantization and accelerator design, and calls for more research breakthroughs in the area for AI on the edge.

View on arXiv PDF

Similar