PLMay 2
DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor ProgramsSize Zheng, Xuegui Zheng, Hanshi Sun et al.
The scaling of large language models (LLMs) is currently bottlenecked by the rigidity of distributed programming. While high-performance libraries like CuBLAS and NCCL provide optimized primitives, they lack the flexibility required for rapidly evolving model architectures. Conversely, existing tensor compilers fail to address the complex memory hierarchy of distributed clusters effectively. To bridge this gap, we propose DITRON, a scalable tile-level compiler that democratizes high-performance distributed kernel development. DITRON introduces a novel hierarchical programming abstraction spanning Core, Device, and Task levels to map tensor programs efficiently onto heterogeneous distributed hardware. This abstraction allows DITRON to support diverse parallelism strategies while abstracting away the complexity of inter-node and intra-node communication. Evaluated across large-scale clusters, DITRON achieves performance parity with or exceeding expert-tuned CUDA libraries, delivering speedups of $6\%-30\%$ on isolated kernels and $5\%-30\%$ on end-to-end inference in vLLM. Furthermore, DITRON demonstrates strong portability, achieving significant speedups on both NVIDIA and AMD platforms. \ours{} has been deployed at the enterprise level for both training and inference. It achieves an MFU improvement of over 10\% in training tasks, saving approximately 500,000 GPU hours of training cost per month. For inference tasks, it delivers an end-to-end gain of over 20\% and has been applied to cloud service inference and edge inference scenarios.
SEJan 13, 2021
An Empirical Study on Deployment Faults of Deep Learning Based Mobile ApplicationsZhenpeng Chen, Huihan Yao, Yiling Lou et al.
Deep Learning (DL) is finding its way into a growing number of mobile software applications. These software applications, named as DL based mobile applications (abbreviated as mobile DL apps) integrate DL models trained using large-scale data with DL programs. A DL program encodes the structure of a desirable DL model and the process by which the model is trained using training data. Due to the increasing dependency of current mobile apps on DL, software engineering (SE) for mobile DL apps has become important. However, existing efforts in SE research community mainly focus on the development of DL models and extensively analyze faults in DL programs. In contrast, faults related to the deployment of DL models on mobile devices (named as deployment faults of mobile DL apps) have not been well studied. Since mobile DL apps have been used by billions of end users daily for various purposes including for safety-critical scenarios, characterizing their deployment faults is of enormous importance. To fill the knowledge gap, this paper presents the first comprehensive study on the deployment faults of mobile DL apps. We identify 304 real deployment faults from Stack Overflow and GitHub, two commonly used data sources for studying software faults. Based on the identified faults, we construct a fine-granularity taxonomy consisting of 23 categories regarding to fault symptoms and distill common fix strategies for different fault types. Furthermore, we suggest actionable implications and research avenues that could further facilitate the deployment of DL models on mobile devices.
SEMay 2, 2020
A Comprehensive Study on Challenges in Deploying Deep Learning Based SoftwareZhenpeng Chen, Yanbin Cao, Yuanqiang Liu et al.
Deep learning (DL) becomes increasingly pervasive, being used in a wide range of software applications. These software applications, named as DL based software (in short as DL software), integrate DL models trained using a large data corpus with DL programs written based on DL frameworks such as TensorFlow and Keras. A DL program encodes the network structure of a desirable DL model and the process by which the model is trained using the training data. To help developers of DL software meet the new challenges posed by DL, enormous research efforts in software engineering have been devoted. Existing studies focus on the development of DL software and extensively analyze faults in DL programs. However, the deployment of DL software has not been comprehensively studied. To fill this knowledge gap, this paper presents a comprehensive study on understanding challenges in deploying DL software. We mine and analyze 3,023 relevant posts from Stack Overflow, a popular Q&A website for developers, and show the increasing popularity and high difficulty of DL software deployment among developers. We build a taxonomy of specific challenges encountered by developers in the process of DL software deployment through manual inspection of 769 sampled posts and report a series of actionable implications for researchers, developers, and DL framework vendors.
LGNov 8, 2018
A First Look at Deep Learning Apps on SmartphonesMengwei Xu, Jiawei Liu, Yuanqiang Liu et al.
We are in the dawn of deep learning explosion for smartphones. To bridge the gap between research and practice, we present the first empirical study on 16,500 the most popular Android apps, demystifying how smartphone apps exploit deep learning in the wild. To this end, we build a new static tool that dissects apps and analyzes their deep learning functions. Our study answers threefold questions: what are the early adopter apps of deep learning, what do they use deep learning for, and how do their deep learning models look like. Our study has strong implications for app developers, smartphone vendors, and deep learning R\&D. On one hand, our findings paint a promising picture of deep learning for smartphones, showing the prosperity of mobile deep learning frameworks as well as the prosperity of apps building their cores atop deep learning. On the other hand, our findings urge optimizations on deep learning models deployed on smartphones, the protection of these models, and validation of research ideas on these models.