NIApr 16
Tail Contagion: Sub-microsecond Time Protection in Shared Software Network DatapathsMatheus Stolet, Liam Arzola, Simon Peter et al.
Shared software datapaths underpin modern datacentre networking. They implement mechanisms such as virtual switching, network virtualisation tunneling, or reliable transport, and enforce policies, such as tenant rate limits, virtual network isolation, or congestion control. However, because multiple applications, containers, or VMs share them, often across tenants, they pose a tail latency isolation challenge. Current isolation approaches either sacrifice efficiency via coarse-grained core partitioning or provide weak tail latency isolation when sharing cores with basic rate limits. This paper presents Virtuoso, a time protection mechanism for shared software datapaths that provides strong cross-tenant tail latency isolation while preserving low overhead and microsecond-scale latency. Our key insight is that tail latency is fundamentally a time metric, so byte or packet throughput is the wrong metric for controlling interference when packet processing costs vary. Our design instead enforces isolation through per-tenant CPU-time budgets at datapath intervention points within run-to-completion loops, without relying on preemption. In a case study, we instantiate Virtuoso in the TAS TCP stack and demonstrate a 7.8X reduction in victim tail latency under adversarial interference while keeping throughput within 5% of unmodified TAS. We also observe a 3X per-core efficiency improvement compared to siloed datapaths under bursty workloads.
DCApr 24
LaissezCloud: Continuous Resource Renegotiation for the Public CloudTejas Harith, Antoine Kaufmann
Public clouds increasingly expose heterogeneous hardware, but their allocation interface remains built around rigid on-demand and spot service classes. This makes it hard to satisfy time-varying tenant objectives and operator constraints in oversubscribed, heterogeneous clusters without exposing internal application or infrastructure state. We present LaissezCloud, a cloud resource management platform for continuous re-negotiation of running allocations. Unlike spot instances, which use launch-time bids and unilateral preemption, LaissezCloud keeps allocations continuously contestable during execution: tenants and operators update bids online, and a running tenant keeps a resource only as long as its bid exceeds competing demand. Pricing serves both as a narrow waist and as an incentive-alignment mechanism between mutually untrusted participants: tenants express utility through bids, while operators price in power, cooling, or carbon constraints without exposing internal telemetry. Across a diverse set of accelerator workloads, LaissezCloud reduces performance degradation under contention by 8-23% versus on-demand and spot baselines, and scales to clusters of at least 10,000 nodes.
NIApr 24
Chamelio: A Fast Shared Cloud Network Stack for Isolated Tenant-Defined ProtocolsMatheus Stolet, Simon Peter, Antoine Kaufmann
Conventional cloud network virtualization sends packets through multiple guest and host layers, inflating CPU cost and tail latency. Shared host datapaths collapse this layering into one optimized path across tenants, but existing shared stacks are fixed-function: tenants cannot specialize their protocols. eBPF is the natural vehicle for restoring programmability to a shared datapath, but today's extensions are hook-sized, and its verifier provides safety -- not performance isolation: one tenant's per-packet work can inflate every other tenant's tail latency. Chamelio is a programmable shared network stack that lets tenants implement full protocols through a bounded eBPF fast path and a tenant slow path, while approaching the performance and preserving the strong isolation of fixed shared stacks. It combines three ideas: a shared-stack architecture for tenant-defined protocols; joint optimisation of tenant handlers with provider infrastructure and co-resident tenants in the shared fast path; and a bounded fast path contract with runtime cycle accounting that keeps tenant programmability compatible with strong performance isolation. A tenant programmable TCP on Chamelio reaches 9.2 Mreq/s, matching the hand-tuned TAS stack; joint compilation shrinks the programmability tax from 23.9% to 3.8%; and under a scaling TCP adversary that drives uninstrumented stacks to 154 microseconds, Chamelio bounds victim tail latency at 46 microseconds.
DCJun 3, 2020
Serving DNNs like Clockwork: Performance Predictability from the Bottom UpArpan Gujarati, Reza Karimi, Safya Alzayat et al.
Machine learning inference is becoming a core building block for interactive web applications. As a result, the underlying model serving systems on which these applications depend must consistently meet low latency targets. Existing model serving architectures use well-known reactive techniques to alleviate common-case sources of latency, but cannot effectively curtail tail latency caused by unpredictable execution times. Yet the underlying execution times are not fundamentally unpredictable - on the contrary we observe that inference using Deep Neural Network (DNN) models has deterministic performance. Here, starting with the predictable execution times of individual DNN inferences, we adopt a principled design methodology to successively build a fully distributed model serving system that achieves predictable end-to-end performance. We evaluate our implementation, Clockwork, using production trace workloads, and show that Clockwork can support thousands of models while simultaneously meeting 100ms latency targets for 99.9999% of requests. We further demonstrate that Clockwork exploits predictable execution times to achieve tight request-level service-level objectives (SLOs) as well as a high degree of request-level performance isolation.