AISep 16, 2023

Empowering In-Browser Deep Learning Inference on Edge Devices with Just-in-Time Kernel Optimizations

arXiv:2309.08978v27 citationsh-index: 16
Originality Highly original
AI Analysis

This addresses the problem of efficient AI service delivery on edge devices for web developers and users, with incremental improvements in kernel optimization for Web environments.

The paper tackles the performance limitations of in-browser deep learning inference on edge devices due to hardware heterogeneity and underdeveloped Web acceleration, presenting nnJIT, a system that achieves up to 8.2x faster inference within 30 seconds compared to baselines.

Web is increasingly becoming the primary platform to deliver AI services onto edge devices, making in-browser deep learning (DL) inference more prominent. Nevertheless, the heterogeneity of edge devices, combined with the underdeveloped state of Web hardware acceleration practices, hinders current in-browser inference from achieving its full performance potential on target devices. To address this issue, this paper presents the pioneering inbrowser inference system, nnJIT, which enables just-in-time (JIT) auto-generation of optimized computing kernels for edge devices. nnJIT is built upon two novel techniques that significantly reduce kernel search and compilation overhead while improving performance firmly: Tensor-Web Compiling Co-Design lowers compiling costs by around 100X through eliminating redundant and ineffective compiling passes; Web-Specific Lite Kernel Optimization Space reduces kernel tuning costs by focusing on Web programming requirements and efficient device resource utilization, pruning the optimization space from millions to only dozens. nnJIT is evaluated for modern models, e.g., BART, T5, and Llama 2, on a range of edge devices including laptops and smartphones using different browsers and hardware from ARM, Intel, AMD and Nvidia. The results show that nnJIT can achieve up to 8.2X faster within 30 seconds compared to the existing baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes