NIApr 27

MatchRDMA: A Segmented and Rate-Matched Long-Haul RDMA Scheme for Geo-distributed LLM Training over OTN

Jun Dai, Xiaorun Wang, Xingde Li, Zheng Yang, Kexiong Fang, Zhiqun Gu, Hongxiang Wang, Yuefeng Ji, Jiawei Zhang

arXiv:2604.2393287.5

AI Analysis

Addresses the bottleneck of long-haul RDMA for geo-distributed LLM training, enabling more efficient use of optical transport networks.

MatchRDMA improves inter-datacenter throughput by up to 20x and reduces buffer occupancy by up to 62.7% for geo-distributed LLM training over OTN.

We propose MatchRDMA, a proactive, segmented, and rate-matched long-haul RDMA scheme for geo-distributed LLM training over OTN. By coordinating source and destination OTN rates, it improves inter-DC throughput by up to 20x compared with conventional RDMA, and reduces destination-OTN buffer occupancy by up to 62.7%.

View on arXiv PDF

Similar