1.8PFMay 18
Modeling the Impact of Fiber Latency on Compute-Communication Overlap in Geo-Distributed Multi-Datacenter AI TrainingIoannis Papavasileiou, Sairam Prabhakar, Indu Kant Deo et al.
We use discrete-event simulation to quantify the impact of fiber latency on the efficacy of geo-distributed AI model training with data parallelism. We conclude that the optimum distances between two AI clusters is 10-100km, over which hollow-core fiber enables 25% higher compute-communication overlap.