NIApr 27
MatchRDMA: A Segmented and Rate-Matched Long-Haul RDMA Scheme for Geo-distributed LLM Training over OTN
Jun Dai, Xiaorun Wang, Xingde Li, Zheng Yang, Kexiong Fang, Zhiqun Gu, Hongxiang Wang, Yuefeng Ji, Jiawei Zhang
arXiv:2604.2393287.5
AI Analysis
Addresses the bottleneck of long-haul RDMA for geo-distributed LLM training, enabling more efficient use of optical transport networks.
MatchRDMA improves inter-datacenter throughput by up to 20x and reduces buffer occupancy by up to 62.7% for geo-distributed LLM training over OTN.
We propose MatchRDMA, a proactive, segmented, and rate-matched long-haul RDMA scheme for geo-distributed LLM training over OTN. By coordinating source and destination OTN rates, it improves inter-DC throughput by up to 20x compared with conventional RDMA, and reduces destination-OTN buffer occupancy by up to 62.7%.