SY AI NISep 24, 2025

CollaPipe: Adaptive Segment-Optimized Pipeline Parallelism for Collaborative LLM Training in Heterogeneous Edge Networks

Jiewei Chen, Xiumei Deng, Zehui Xiong, Shaoyong Guo, Xuesong Qiu, Ping Wang, Dusit Niyato

arXiv:2509.19855v1h-index: 116

Originality Incremental advance

AI Analysis

This work addresses the problem of efficient LLM training for multi-agent collaboration in mobile edge computing networks, representing an incremental advancement by combining existing techniques like pipeline parallelism and federated learning with adaptive optimizations.

The paper tackles the challenge of training large language models (LLMs) in heterogeneous edge networks by introducing CollaPipe, a hybrid distributed learning framework that integrates collaborative pipeline parallelism with federated aggregation, resulting in improvements such as up to 15.09% higher computation efficiency, at least 48.98% reduced end-to-end latency, and more than halved single device memory usage.

The increasing demand for intelligent mobile applications has made multi-agent collaboration with Transformer-based large language models (LLMs) essential in mobile edge computing (MEC) networks. However, training LLMs in such environments remains challenging due to heavy computation, high end-to-end latency, and limited model generalization. We introduce CollaPipe, a hybrid distributed learning framework that integrates collaborative pipeline parallelism with federated aggregation to support self-evolving intelligent networks. In CollaPipe, the encoder part is adaptively partitioned into variable-sized segments and deployed across mobile devices for pipeline-parallel training, while the decoder is deployed on edge servers to handle generative tasks. Then we perform global model update via federated aggregation. To enhance training efficiency, we formulate a joint optimization problem that adaptively allocates model segments, micro-batches, bandwidth, and transmission power. We derive and use a closed-form convergence bound to design an Dynamic Segment Scheduling and Resource Allocation (DSSDA) algorithm based on Lyapunov optimization, ensuring system stability under long-term constraints. Extensive experiments on downstream tasks with Transformer and BERT models show that CollaPipe improves computation efficiency by up to 15.09%, reduces end-to-end latency by at least 48.98%, and cuts single device memory usage by more than half, enabling online learning in heterogeneous and dynamic communication environments.

View on arXiv PDF

Similar