Songge Zhang

DC
h-index4
4papers
13citations
Novelty51%
AI Score44

4 Papers

56.4DCMay 6
Delay-Aware Large-Small Model Collaboration over LEO Satellite Networks

Mingyu Guo, Wen Wu, Ying Wang et al.

In this paper, we introduce a delay-aware largesmall model collaboration scheme for low Earth orbit (LEO) satellite networks, which can balance the computational load among satellites and the communication load across inter-satellite links. Specifically, computational resource constrained remote sensing satellites are responsible for data collection and local processing using small models, while collaborating with computing satellites that provide large model processing. To minimize the service delay, we formulate a joint optimization problem for offloading decision and routing strategy design, which is transformed into a decentralized partially observable Markov decision process. To solve the problem, we develop a multi-agent reinforcement learning (MARL)-based algorithm with offline policy training and online bisection search. The offline trained policy determines routing strategies, while online bisection search iteratively adjusts the offloading decisions. Simulation results demonstrate that the proposed scheme can reduce the service delay by up to 31.85% compared with the benchmarks.

LGNov 27, 2024
Energy-Efficient Split Learning for Fine-Tuning Large Language Models in Edge Networks

Zuguang Li, Shaohua Wu, Liang Li et al.

In this letter, we propose an energy-efficient split learning (SL) framework for fine-tuning large language models (LLMs) using geo-distributed personal data at the network edge, where LLMs are split and alternately across massive mobile devices and an edge server. Considering the device heterogeneity and channel dynamics in edge networks, a \underline{C}ut l\underline{A}yer and computing \underline{R}esource \underline{D}ecision (CARD) algorithm is developed to minimize training delay and energy consumption. Simulation results demonstrate that the proposed approach reduces the average training delay and server's energy consumption by 70.8% and 53.1%, compared to the benchmarks, respectively.

72.3DCApr 6
Communication-Efficient Collaborative LLM Inference over LEO Satellite Networks

Songge Zhang, Wen Wu, Liang Li et al.

Low Earth orbit (LEO) satellites play an essential role in intelligent Earth observation by leveraging artificial intelligence models. However, limited onboard memory and excessive inference delay prevent the practical deployment of large language models (LLMs) on a single satellite. In this paper, we propose a communication-efficient collaborative LLM inference scheme for LEO satellite networks. Specifically, the entire LLM is split into multiple sub-models, with each deployed on a satellite, thereby enabling collaborative LLM inference via exchanging intermediate activations between satellites. The proposed scheme also leverages the pipeline parallelism mechanism that overlaps sub-model inference with intermediate activation transmission, thereby reducing LLM inference delay. An adaptive activation compression scheme is designed to mitigate cumulative errors from multi-stage model splitting while preserving inference accuracy. Furthermore, we formulate the LLM inference delay minimization problem by jointly optimizing model splitting and compression ratios under onboard memory and inference accuracy constraints. The problem is transformed into a shortest-path search problem over a directed acyclic graph that edge weights explicitly quantify the inference delay induced by model splitting and compression strategies, which is solved via a modified A Star-based search algorithm. Extensive simulation results indicate that the proposed solution can reduce inference delay by up to 42% and communication overhead by up to 71% compared to state-of-the-art benchmarks, while maintaining the inference accuracy loss of less than 1%.

LGJun 23, 2025
Fast AI Model Splitting over Edge Networks

Zuguang Li, Wen Wu, Shaohua Wu et al.

Split learning (SL) has emerged as a computationally efficient approach for artificial intelligence (AI) model training, which can alleviate device-side computational workloads. However, complex AI model architectures pose high computational complexity to obtain the optimal model splitting. In this paper, we represent an arbitrary AI model as a directed acyclic graph (DAG), and then reformulate the optimal model splitting problem as a minimum s-t cut search problem. To solve the problem, we propose a fast DAG-based model splitting algorithm, which restructures the DAG to enable the optimal model splitting identification via a maximum flow method. Theoretical analysis indicates that the proposed algorithm is optimal. Furthermore, considering AI models with block structures, we propose a block-wise model splitting algorithm to reduce computational complexity. The algorithm abstracts each block, i.e., a component consisting of multiple layers, into a single vertex, thereby obtaining the optimal model splitting via a simplified DAG. Extensive experimental results demonstrate that the proposed algorithms can determine the optimal model splitting within milliseconds, as well as reduce training delay by 24.62%-38.95% in dynamic edge networks as compared to the state-of-the-art benchmarks.