NIApr 26

Adaptive Swin Transformer Partitioning over AI-RAN Networks

Tam Thanh Nguyen, Yong Hao Pua, Tuan Van Ngo, Mao V. Ngo, Jihong Park, Binbin Chen, Tony Q. S. Quek

arXiv:2604.2355449.9

Predicted impact top 16% in NI · last 90 daysOriginality Incremental advance

AI Analysis

For researchers and engineers deploying transformer-based vision models in real-time 5G edge networks, this work provides a practical split inference system with adaptive partitioning and compression, though it is an incremental extension of prior CNN-based methods.

This paper demonstrates the feasibility of transformer-based split inference for real-time video object detection over dynamic 5G AI-RAN networks, achieving practical execution without retraining and reducing uplink payload via efficient activation compression. End-to-end validation on an NVIDIA Aerial testbed quantifies latency-energy-privacy trade-offs.

This paper demonstrates the feasibility of transformer-based split inference for real-time video object detection over dynamic 5G AI-RAN networks. We extend throughput-aware adaptive splitting from CNNs to a Swin Transformer backbone and show that practical split execution is achievable for transformer-based vision models without retraining. To address the large intermediate activations inherent to transformers, we introduce an efficient, accuracy-preserving activation compression pipeline that substantially reduces uplink payload. The complete system -- including adaptive split selection, transformer inference, and compression -- is implemented and validated end-to-end on a real-time detection workload, with distributed UPF (dUPF) integration further reducing user-plane latency and improving runtime stability. Extensive measurements on an NVIDIA Aerial-based AI-RAN testbed jointly account for inference and 5G communication energy, quantifying the latency-energy-privacy trade-offs in realistic deployments.

View on arXiv PDF

Similar