Srikanth Kandula

h-index57

5papers

111citations

Novelty56%

AI Score29

Ranked #140,878 of 194,257 authors (top 73%)#369 in NI (top 60%)

5 Papers

13.0NIAug 11, 2023Code

Enhancing Network Management Using Code Generated by Large Language Models

Sathiya Kumaran Mani, Yajie Zhou, Kevin Hsieh et al.

Analyzing network topologies and communication graphs plays a crucial role in contemporary network management. However, the absence of a cohesive approach leads to a challenging learning curve, heightened errors, and inefficiencies. In this paper, we introduce a novel approach to facilitate a natural-language-based network management experience, utilizing large language models (LLMs) to generate task-specific code from natural language queries. This method tackles the challenges of explainability, scalability, and privacy by allowing network operators to inspect the generated code, eliminating the need to share network data with LLMs, and concentrating on application-specific requests combined with general program synthesis techniques. We design and evaluate a prototype system using benchmark applications, showcasing high accuracy, cost-effectiveness, and the potential for further enhancements using complementary program synthesis techniques.

1.2DCApr 25, 2016

Do the Hard Stuff First: Scheduling Dependent Computations in Data-Analytics Clusters

Robert Grandl, Srikanth Kandula, Sriram Rao et al.

We present a scheduler that improves cluster utilization and job completion times by packing tasks having multi-resource requirements and inter-dependencies. While the problem is algorithmically very hard, we achieve near-optimality on the job DAGs that appear in production clusters at a large enterprise and in benchmarks such as TPC-DS. A key insight is that carefully handling the long-running tasks and those with tough-to-pack resource needs will produce good-enough schedules. However, which subset of tasks to treat carefully is not clear (and intractable to discover). Hence, we offer a search procedure that evaluates various possibilities and outputs a preferred schedule order over tasks. An online component enforces the schedule orders desired by the various jobs running on the cluster. In addition, it packs tasks, overbooks the fungible resources and guarantees bounded unfairness for a variety of desirable fairness schemes. Relative to the state-of-the art schedulers, we speed up 50% of the jobs by over 30% each.

1.2DCSep 15, 2019

Efficient Inter-Datacenter Bulk Transfers with Mixed Completion Time Objectives

Mohammad Noormohammadpour, Srikanth Kandula, Cauligi S. Raghavendra et al.

Bulk transfers from one to multiple datacenters can have many different completion time objectives ranging from quickly replicating some $k$ copies to minimizing the time by which the last destination receives a full replica. We design an SDN-style wide-area traffic scheduler that optimizes different completion time objectives for various requests. The scheduler builds, for each bulk transfer, one or more multicast forwarding trees which preferentially use lightly loaded network links. Multiple multicast trees are used per bulk transfer to insulate destinations that have higher available bandwidth and can hence finish quickly from congested destinations. These decisions--how many trees to construct and which receivers to serve using a given tree--result from an optimization problem that minimizes a weighted sum of transfers' completion time objectives and their bandwidth consumption. Results from simulations and emulations on Mininet show that our scheduler, Iris, can improve different completion time objectives by about $2.5\times$.

1.2NIMar 1, 2023

A Deep Learning Perspective on Network Routing

Yarin Perry, Felipe Vieira Frujeri, Chaim Hoch et al.

Routing is, arguably, the most fundamental task in computer networking, and the most extensively studied one. A key challenge for routing in real-world environments is the need to contend with uncertainty about future traffic demands. We present a new approach to routing under demand uncertainty: tackling this challenge as stochastic optimization, and employing deep learning to learn complex patterns in traffic demands. We show that our method provably converges to the global optimum in well-studied theoretical models of multicommodity flow. We exemplify the practical usefulness of our approach by zooming in on the real-world challenge of traffic engineering (TE) on wide-area networks (WANs). Our extensive empirical evaluation on real-world traffic and network topologies establishes that our approach's TE quality almost matches that of an (infeasible) omniscient oracle, outperforming previously proposed approaches, and also substantially lowers runtimes.

1.2NIJul 7, 2017

DCCast: Efficient Point to Multipoint Transfers Across Datacenters

Mohammad Noormohammadpour, Cauligi S. Raghavendra, Sriram Rao et al.

Using multiple datacenters allows for higher availability, load balancing and reduced latency to customers of cloud services. To distribute multiple copies of data, cloud providers depend on inter-datacenter WANs that ought to be used efficiently considering their limited capacity and the ever-increasing data demands. In this paper, we focus on applications that transfer objects from one datacenter to several datacenters over dedicated inter-datacenter networks. We present DCCast, a centralized Point to Multi-Point (P2MP) algorithm that uses forwarding trees to efficiently deliver an object from a source datacenter to required destination datacenters. With low computational overhead, DCCast selects forwarding trees that minimize bandwidth usage and balance load across all links. With simulation experiments on Google's GScale network, we show that DCCast can reduce total bandwidth usage and tail Transfer Completion Times (TCT) by up to $50\%$ compared to delivering the same objects via independent point-to-point (P2P) transfers.