22.0NIMar 13
Evaluation of TCP Congestion Control for Public High-Performance Wide-Area NetworksFatih Berkay Sarpkaya, Andrea Francini, Bilgehan Erman et al.
Practitioners of a growing number of scientific and artificial-intelligence (AI) applications use High-Performance Wide-Area Networks (HP-WANs) for moving massive data sets between remote facilities. Accurate prediction of the flow completion time (FCT) is essential in these data-transfer workflows because compute and storage resources are tightly scheduled and expensive. We assess the viability of three TCP congestion control algorithms (CUBIC, BBRv1, and BBRv3) for massive data transfers over public HP-WANs, where limited control of critical data-path parameters precludes the use of Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCEv2), which is known to outperform TCP in private HP-WANs. Extensive experiments on the FABRIC testbed indicate that the configuration control limitations can also hinder TCP, especially through microburst-induced packet losses. Under these challenging conditions, we show that the highest FCT predictability is achieved by combination of BBRv1 with the application of traffic shaping before the HP-WAN entry points.
NIOct 9, 2019
Hierarchical Deep Double Q-RoutingRamy E. Ali, Bilgehan Erman, Ejder Baştuğ et al.
This paper explores a deep reinforcement learning approach applied to the packet routing problem with high-dimensional constraints instigated by dynamic and autonomous communication networks. Our approach is motivated by the fact that centralized path calculation approaches are often not scalable, whereas the distributed approaches with locally acting nodes are not fully aware of the end-to-end performance. We instead hierarchically distribute the path calculation over designated nodes in the network while taking into account the end-to-end performance. Specifically, we develop a hierarchical cluster-oriented adaptive per-flow path calculation mechanism by leveraging the Deep Double Q-network (DDQN) algorithm, where the end-to-end paths are calculated by the source nodes with the assistance of cluster (group) leaders at different hierarchical levels. In our approach, a deferred composite reward is designed to capture the end-to-end performance through a feedback signal from the source nodes to the group leaders and captures the local network performance through the local resource assessments by the group leaders. This approach scales in large networks, adapts to the dynamic demand, utilizes the network resources efficiently and can be applied to segment routing.