LG AI DCNov 22, 2023

Confidant: Customizing Transformer-based LLMs via Collaborative Edge Training

Yuhao Chen, Yuxuan Yan, Qianqian Yang, Yuanchao Shu, Shibo He, Jiming Chen

arXiv:2311.13381v13.89 citationsh-index: 50

Originality Incremental advance

AI Analysis

This addresses the problem of efficient LLM customization on mobile devices for edge computing applications, representing an incremental improvement in distributed training methods.

The paper tackles the challenge of deploying and fine-tuning large language models on mobile edge devices with limited resources by proposing Confidant, a collaborative training framework that partitions models and uses pipeline parallelism, achieving up to 45.3% memory reduction and 8.03x inference speedup.

Transformer-based large language models (LLMs) have demonstrated impressive capabilities in a variety of natural language processing (NLP) tasks. Nonetheless, it is challenging to deploy and fine-tune LLMs on mobile edge devices with limited computing, memory, and energy budgets. In this paper, we propose Confidant, a multi-backend collaborative training framework for customizing state-of-the-art LLMs on commodity mobile devices like smartphones. Confidant partitions an LLM into several sub-models so that each fits into a mobile device's memory. A pipeline parallel training mechanism is further developed to ensure fast and efficient distributed training. In addition, we propose a novel backend scheduler to allocate different attention heads to heterogeneous compute hardware, including mobile CPU and GPUs, to maximize the compute resource utilization on each edge device. Our preliminary experimental results show that Confidant achieves at most 45.3% memory reduction and 8.03x inference speedup in practical settings.

View on arXiv PDF

Similar