43.6DCMay 6
A Performance Analyzer for a Public Cloud's ML-Augmented VM AllocatorRoozbeh Bostandoost, Pooria Namyar, Siva Kesava Reddy Kakarla et al.
Cloud operators increasingly deploy multiple ML models in their VM allocation pipelines. In such settings, individually benign predictions can shift and compound, severely degrading performance. In a cloud provider's VM placement pipeline, CPU, memory, and lifetime prediction models jointly determine server count, live migration frequency, and network utilization; yet no existing approach can systematically stress-test how these models adversely interact. Deterministic adversarial analyzers cannot capture probabilistic ML behavior, so operators miss failures that arise only from correlated distributional shifts across models In SANJESH, we formulate a bi-level optimization that captures how the ML models behave statistically and uncovers how they adversely interact. The outer level searches over what predictions the ML models could produce under distributional uncertainty to find adversarial conditions; the inner level evaluates how the VM allocator behaves given those predictions. When we applied it to the operator's production traces, SANJESH uncovered scenarios that cause $4\times$ worse performance than the operators' evaluator detected.
DCDec 17, 2025
Dynamic Rebatching for Efficient Early-Exit Inference with DREXXuting Liu, Daniel Alexander, Siva Kesava Reddy Kakarla et al.
Early-Exit (EE) is a Large Language Model (LLM) architecture that accelerates inference by allowing easier tokens to be generated using only a subset of the model's layers. However, traditional batching frameworks are ill-suited for EE LLMs, as not all requests in a batch may be ready to exit at the same time. Existing solutions either force a uniform decision on the batch, which overlooks EE opportunities, or degrade output quality by forcing premature exits. We propose Dynamic Rebatching, a solution where we dynamically reorganize the batch at each early-exit point. Requests that meet the exit criteria are immediately processed, while those that continue are held in a buffer, re-grouped into a new batch, and forwarded to deeper layers. We introduce DREX, an early-exit inference system that implements Dynamic Rebatching with two key optimizations: 1) a copy-free rebatching buffer that avoids physical data movement, and 2) an EE and SLA-aware scheduler that analytically predicts whether a given rebatching operation will be profitable. DREX also efficiently handles the missing KV cache from skipped layers using memory-efficient state-copying. Our evaluation shows that DREX improves throughput by 2-12% compared to baseline approaches while maintaining output quality. Crucially, DREX completely eliminates involuntary exits, providing a key guarantee for preserving the output quality intended by the EE model.
AIOct 19, 2024
Towards Safer Heuristics With XPlainPantea Karimi, Solal Pirelli, Siva Kesava Reddy Kakarla et al.
Many problems that cloud operators solve are computationally expensive, and operators often use heuristic algorithms (that are faster and scale better than optimal) to solve them more efficiently. Heuristic analyzers enable operators to find when and by how much their heuristics underperform. However, these tools do not provide enough detail for operators to mitigate the heuristic's impact in practice: they only discover a single input instance that causes the heuristic to underperform (and not the full set), and they do not explain why. We propose XPlain, a tool that extends these analyzers and helps operators understand when and why their heuristics underperform. We present promising initial results that show such an extension is viable.
AIOct 9, 2025
Robust Heuristic Algorithm Design with LLMsPantea Karimi, Dany Rouhana, Pooria Namyar et al.
We posit that we can generate more robust and performant heuristics if we augment approaches using LLMs for heuristic design with tools that explain why heuristics underperform and suggestions about how to fix them. We find even simple ideas that (1) expose the LLM to instances where the heuristic underperforms; (2) explain why they occur; and (3) specialize design to regions in the input space, can produce more robust algorithms compared to existing techniques~ -- ~the heuristics we produce have a $\sim28\times$ better worst-case performance compared to FunSearch, improve average performance, and maintain the runtime.
CRFeb 24, 2019
Expect More from the Networking: DDoS Mitigation by FITT in Named Data NetworkingZhiyi Zhang, Vishrant Vasavada, Siva Kesava Reddy Kakarla et al.
Distributed Denial of Service (DDoS) attacks have plagued the Internet for decades, but the basic defense approaches have not fundamentally changed. Rather, the size and rate of growth in attacks have actually outpaced carriers' and DDoS mitigation services' growth, calling for new solutions that can be, partially or fully, deployed imminently and exhibit effectiveness. In this paper, we examine the basic functions in Named Data Networking (NDN), a newly proposed Internet architecture, that can address the principle weaknesses in today's IP networks. We demonstrate by a new DDoS mitigation solution over NDN, Fine-grained Interest Traffic Throttling FITT, that NDN's architectural changes, even when incrementally deployed, can make DDoS attacks fundamentally more difficult to launch and less effective. FITT leverages the NDN design to enable the network to detect DDoS from victim's feedback, throttles DDoS traffic by reverse its exact paths through the network, and enforces control over the misbehaving entities at their sources. Our extensive simulation results show that FITT can throttle attack traffic with one-way time delay from the victim to the NDN gateway; upon activation, FITT effectively stop attack traffic from impacting benign flows, resulting in over 99\% of packets reaching victims being legitimate ones. We further demonstrate that service providers may implement NDN/FITT on existing CDN nodes as an incrementally deployable solution to effectuate the application level remediation at the sources, which remains unattainable in today's DDoS mitigation approaches.