TSFLora: Token-Compressed Split Fine-Tuning for Wireless Edge Networks
For wireless edge networks, TSFLora enables efficient fine-tuning of large models on resource-constrained devices by significantly reducing communication and memory overhead.
TSFLora addresses the challenge of adapting large AI models to edge devices with limited resources by combining token compression and split learning, achieving up to 6.8× communication reduction and 41% memory savings while maintaining competitive accuracy on image classification tasks.
Adapting large AI models (LAMs) to personalized edge data is challenging because wireless devices have limited memory, computation, and uplink capacity. Federated fine-tuning preserves data privacy but still requires each device to host the full model, while split learning reduces device memory at the cost of heavy activation transmission. This paper proposes TSFLora, a token-compressed split fine-tuning framework for communication-efficient LAM adaptation at the edge. TSFLora combines attention-guided token selection, token merging, low-bit activation quantization, and LoRA-based adaptation within a split federated training pipeline. The key idea is to compress the intermediate token sequence before transmission so that the system reduces both uplink traffic and server-side processing without changing the frozen backbone. Experiments on ViT models over CIFAR-10, CIFAR-100, and TinyImageNet show that TSFLora achieves up to \textbf{6.8$\times$} communication reduction and \textbf{41\%} memory saving while maintaining competitive accuracy.