NIAug 11, 2023
Enhancing Network Management Using Code Generated by Large Language ModelsSathiya Kumaran Mani, Yajie Zhou, Kevin Hsieh et al.
Analyzing network topologies and communication graphs plays a crucial role in contemporary network management. However, the absence of a cohesive approach leads to a challenging learning curve, heightened errors, and inefficiencies. In this paper, we introduce a novel approach to facilitate a natural-language-based network management experience, utilizing large language models (LLMs) to generate task-specific code from natural language queries. This method tackles the challenges of explainability, scalability, and privacy by allowing network operators to inspect the generated code, eliminating the need to share network data with LLMs, and concentrating on application-specific requests combined with general program synthesis techniques. We design and evaluate a prototype system using benchmark applications, showcasing high accuracy, cost-effectiveness, and the potential for further enhancements using complementary program synthesis techniques.
NIJan 2, 2018
QuickCast: Fast and Efficient Inter-Datacenter Transfers using Forwarding Tree CohortsMohammad Noormohammadpour, Cauligi S. Raghavendra, Srikanth Kandula et al.
Large inter-datacenter transfers are crucial for cloud service efficiency and are increasingly used by organizations that have dedicated wide area networks between datacenters. A recent work uses multicast forwarding trees to reduce the bandwidth needs and improve completion times of point-to-multipoint transfers. Using a single forwarding tree per transfer, however, leads to poor performance because the slowest receiver dictates the completion time for all receivers. Using multiple forwarding trees per transfer alleviates this concern--the average receiver could finish early; however, if done naively, bandwidth usage would also increase and it is apriori unclear how best to partition receivers, how to construct the multiple trees and how to determine the rate and schedule of flows on these trees. This paper presents QuickCast, a first solution to these problems. Using simulations on real-world network topologies, we see that QuickCast can speed up the average receiver's completion time by as much as $10\times$ while only using $1.04\times$ more bandwidth; further, the completion time for all receivers also improves by as much as $1.6\times$ faster at high loads.
DCApr 25, 2016
Do the Hard Stuff First: Scheduling Dependent Computations in Data-Analytics ClustersRobert Grandl, Srikanth Kandula, Sriram Rao et al.
We present a scheduler that improves cluster utilization and job completion times by packing tasks having multi-resource requirements and inter-dependencies. While the problem is algorithmically very hard, we achieve near-optimality on the job DAGs that appear in production clusters at a large enterprise and in benchmarks such as TPC-DS. A key insight is that carefully handling the long-running tasks and those with tough-to-pack resource needs will produce good-enough schedules. However, which subset of tasks to treat carefully is not clear (and intractable to discover). Hence, we offer a search procedure that evaluates various possibilities and outputs a preferred schedule order over tasks. An online component enforces the schedule orders desired by the various jobs running on the cluster. In addition, it packs tasks, overbooks the fungible resources and guarantees bounded unfairness for a variety of desirable fairness schemes. Relative to the state-of-the art schedulers, we speed up 50% of the jobs by over 30% each.
DCSep 15, 2019
Efficient Inter-Datacenter Bulk Transfers with Mixed Completion Time ObjectivesMohammad Noormohammadpour, Srikanth Kandula, Cauligi S. Raghavendra et al.
Bulk transfers from one to multiple datacenters can have many different completion time objectives ranging from quickly replicating some $k$ copies to minimizing the time by which the last destination receives a full replica. We design an SDN-style wide-area traffic scheduler that optimizes different completion time objectives for various requests. The scheduler builds, for each bulk transfer, one or more multicast forwarding trees which preferentially use lightly loaded network links. Multiple multicast trees are used per bulk transfer to insulate destinations that have higher available bandwidth and can hence finish quickly from congested destinations. These decisions--how many trees to construct and which receivers to serve using a given tree--result from an optimization problem that minimizes a weighted sum of transfers' completion time objectives and their bandwidth consumption. Results from simulations and emulations on Mininet show that our scheduler, Iris, can improve different completion time objectives by about $2.5\times$.
NIMar 1, 2023
A Deep Learning Perspective on Network RoutingYarin Perry, Felipe Vieira Frujeri, Chaim Hoch et al.
Routing is, arguably, the most fundamental task in computer networking, and the most extensively studied one. A key challenge for routing in real-world environments is the need to contend with uncertainty about future traffic demands. We present a new approach to routing under demand uncertainty: tackling this challenge as stochastic optimization, and employing deep learning to learn complex patterns in traffic demands. We show that our method provably converges to the global optimum in well-studied theoretical models of multicommodity flow. We exemplify the practical usefulness of our approach by zooming in on the real-world challenge of traffic engineering (TE) on wide-area networks (WANs). Our extensive empirical evaluation on real-world traffic and network topologies establishes that our approach's TE quality almost matches that of an (infeasible) omniscient oracle, outperforming previously proposed approaches, and also substantially lowers runtimes.
DBMar 2, 2025
Speculative Ad-hoc QueryingHaoyu Li, Srikanth Kandula, Maria Angels de Luis Balaguer et al.
Analyzing large datasets requires responsive query execution, but executing SQL queries on massive datasets can be slow. This paper explores whether query execution can begin even before the user has finished typing, allowing results to appear almost instantly. We propose SpeQL, a system that leverages Large Language Models (LLMs) to predict likely queries based on the database schema, the user's past queries, and their incomplete query. Since exact query prediction is infeasible, SpeQL speculates on partial queries in two ways: 1) it predicts the query structure to compile and plan queries in advance, and 2) it precomputes smaller temporary tables that are much smaller than the original database, but are still predicted to contain all information necessary to answer the user's final query. Additionally, SpeQL continuously displays results for speculated queries and subqueries in real time, aiding exploratory analysis. A utility/user study showed that SpeQL improved task completion time, and participants reported that its speculative display of results helped them discover patterns in the data more quickly. In the study, SpeQL improves user's query latency by up to $289\times$ and kept the overhead reasonable, at $\$4$ per hour.
NIJul 7, 2017
DCCast: Efficient Point to Multipoint Transfers Across DatacentersMohammad Noormohammadpour, Cauligi S. Raghavendra, Sriram Rao et al.
Using multiple datacenters allows for higher availability, load balancing and reduced latency to customers of cloud services. To distribute multiple copies of data, cloud providers depend on inter-datacenter WANs that ought to be used efficiently considering their limited capacity and the ever-increasing data demands. In this paper, we focus on applications that transfer objects from one datacenter to several datacenters over dedicated inter-datacenter networks. We present DCCast, a centralized Point to Multi-Point (P2MP) algorithm that uses forwarding trees to efficiently deliver an object from a source datacenter to required destination datacenters. With low computational overhead, DCCast selects forwarding trees that minimize bandwidth usage and balance load across all links. With simulation experiments on Google's GScale network, we show that DCCast can reduce total bandwidth usage and tail Transfer Completion Times (TCT) by up to $50\%$ compared to delivering the same objects via independent point-to-point (P2P) transfers.