Sriram Rao

NI
6papers
125citations
Novelty46%
AI Score25

6 Papers

NIJan 2, 2018
QuickCast: Fast and Efficient Inter-Datacenter Transfers using Forwarding Tree Cohorts

Mohammad Noormohammadpour, Cauligi S. Raghavendra, Srikanth Kandula et al.

Large inter-datacenter transfers are crucial for cloud service efficiency and are increasingly used by organizations that have dedicated wide area networks between datacenters. A recent work uses multicast forwarding trees to reduce the bandwidth needs and improve completion times of point-to-multipoint transfers. Using a single forwarding tree per transfer, however, leads to poor performance because the slowest receiver dictates the completion time for all receivers. Using multiple forwarding trees per transfer alleviates this concern--the average receiver could finish early; however, if done naively, bandwidth usage would also increase and it is apriori unclear how best to partition receivers, how to construct the multiple trees and how to determine the rate and schedule of flows on these trees. This paper presents QuickCast, a first solution to these problems. Using simulations on real-world network topologies, we see that QuickCast can speed up the average receiver's completion time by as much as $10\times$ while only using $1.04\times$ more bandwidth; further, the completion time for all receivers also improves by as much as $1.6\times$ faster at high loads.

DCApr 25, 2016
Do the Hard Stuff First: Scheduling Dependent Computations in Data-Analytics Clusters

Robert Grandl, Srikanth Kandula, Sriram Rao et al.

We present a scheduler that improves cluster utilization and job completion times by packing tasks having multi-resource requirements and inter-dependencies. While the problem is algorithmically very hard, we achieve near-optimality on the job DAGs that appear in production clusters at a large enterprise and in benchmarks such as TPC-DS. A key insight is that carefully handling the long-running tasks and those with tough-to-pack resource needs will produce good-enough schedules. However, which subset of tasks to treat carefully is not clear (and intractable to discover). Hence, we offer a search procedure that evaluates various possibilities and outputs a preferred schedule order over tasks. An online component enforces the schedule orders desired by the various jobs running on the cluster. In addition, it packs tasks, overbooks the fungible resources and guarantees bounded unfairness for a variety of desirable fairness schemes. Relative to the state-of-the art schedulers, we speed up 50% of the jobs by over 30% each.

DCSep 15, 2019
Efficient Inter-Datacenter Bulk Transfers with Mixed Completion Time Objectives

Mohammad Noormohammadpour, Srikanth Kandula, Cauligi S. Raghavendra et al.

Bulk transfers from one to multiple datacenters can have many different completion time objectives ranging from quickly replicating some $k$ copies to minimizing the time by which the last destination receives a full replica. We design an SDN-style wide-area traffic scheduler that optimizes different completion time objectives for various requests. The scheduler builds, for each bulk transfer, one or more multicast forwarding trees which preferentially use lightly loaded network links. Multiple multicast trees are used per bulk transfer to insulate destinations that have higher available bandwidth and can hence finish quickly from congested destinations. These decisions--how many trees to construct and which receivers to serve using a given tree--result from an optimization problem that minimizes a weighted sum of transfers' completion time objectives and their bandwidth consumption. Results from simulations and emulations on Mininet show that our scheduler, Iris, can improve different completion time objectives by about $2.5\times$.

EPJan 22, 2021Code
Nigraha: Machine-learning based pipeline to identify and evaluate planet candidates from TESS

Sriram Rao, Ashish Mahabal, Niyanth Rao et al.

The Transiting Exoplanet Survey Satellite (TESS) has now been operational for a little over two years, covering the Northern and the Southern hemispheres once. The TESS team processes the downlinked data using the Science Processing Operations Center pipeline and Quick Look pipeline to generate alerts for follow-up. Combined with other efforts from the community, over two thousand planet candidates have been found of which tens have been confirmed as planets. We present our pipeline, Nigraha, that is complementary to these approaches. Nigraha uses a combination of transit finding, supervised machine learning, and detailed vetting to identify with high confidence a few planet candidates that were missed by prior searches. In particular, we identify high signal to noise ratio (SNR) shallow transits that may represent more Earth-like planets. In the spirit of open data exploration we provide details of our pipeline, release our supervised machine learning model and code as open source, and make public the 38 candidates we have found in seven sectors. The model can easily be run on other sectors as is. As part of future work we outline ways to increase the yield by strengthening some of the steps where we have been conservative and discarded objects for lack of a datum or two.

NIJul 13, 2017
DCRoute: Speeding up Inter-Datacenter Traffic Allocation while Guaranteeing Deadlines

Mohammad Noormohammadpour, Cauligi S. Raghavendra, Sriram Rao

Datacenters provide the infrastructure for cloud computing services used by millions of users everyday. Many such services are distributed over multiple datacenters at geographically distant locations possibly in different continents. These datacenters are then connected through high speed WAN links over private or public networks. To perform data backups or data synchronization operations, many transfers take place over these networks that have to be completed before a deadline in order to provide necessary service guarantees to end users. Upon arrival of a transfer request, we would like the system to be able to decide whether such a request can be guaranteed successful delivery. If yes, it should provide us with transmission schedule in the shortest time possible. In addition, we would like to avoid packet reordering at the destination as it affects TCP performance. Previous work in this area either cannot guarantee that admitted transfers actually finish before the specified deadlines or use techniques that can result in packet reordering. In this paper, we propose DCRoute, a fast and efficient routing and traffic allocation technique that guarantees transfer completion before deadlines for admitted requests. It assigns each transfer a single path to avoid packet reordering. Through simulations, we show that DCRoute is at least 200 times faster than other traffic allocation techniques based on linear programming (LP) while admitting almost the same amount of traffic to the system.

NIJul 7, 2017
DCCast: Efficient Point to Multipoint Transfers Across Datacenters

Mohammad Noormohammadpour, Cauligi S. Raghavendra, Sriram Rao et al.

Using multiple datacenters allows for higher availability, load balancing and reduced latency to customers of cloud services. To distribute multiple copies of data, cloud providers depend on inter-datacenter WANs that ought to be used efficiently considering their limited capacity and the ever-increasing data demands. In this paper, we focus on applications that transfer objects from one datacenter to several datacenters over dedicated inter-datacenter networks. We present DCCast, a centralized Point to Multi-Point (P2MP) algorithm that uses forwarding trees to efficiently deliver an object from a source datacenter to required destination datacenters. With low computational overhead, DCCast selects forwarding trees that minimize bandwidth usage and balance load across all links. With simulation experiments on Google's GScale network, we show that DCCast can reduce total bandwidth usage and tail Transfer Completion Times (TCT) by up to $50\%$ compared to delivering the same objects via independent point-to-point (P2P) transfers.