Expanding into Reality: Random Graphs for Datacenter Networks
For large-scale datacenter operators, this work provides a practical, cost-effective alternative to traditional fat tree topologies.
Amazon designed and deployed the first production datacenter fabrics based on random graphs (RNG), which match or exceed fat tree performance while being up to 45% cheaper, and made RNG the default fabric for most workloads.
We design and deploy at Amazon the first production datacenter fabrics based on random graphs. While the cost and fault-tolerance benefits of such topologies have been long known, their practical realization has been hampered by a lack of scalable routing and cabling approaches. Our design, called RNG, has a new distributed routing protocol that exploits the properties of random graphs to find a large number of edge disjoint paths between endpoint pairs. A novel passive optical device that internally shuffles cable endpoints makes Amazon's cabling complexity similar to that of fat trees. We show that RNG fabrics match or exceed the performance of fat trees for a range of traffic patterns, despite being up to 45% cheaper. At Amazon, we made RNG the default datacenter fabric for most workloads.