43.6OSApr 20
Equilibria: Fair Multi-Tenant CXL Memory Tiering At ScaleKaiyang Zhao, Neha Gholkar, Hasan Maruf et al.
Memory dominates datacenter system cost and power. Memory expansion via Compute Express Link (CXL) is an effective way to provide additional memory at lower cost and power, but its effective use requires software-level tiering for hyperscaler workloads. Existing tiering solutions, including current Linux support, face fundamental limitations in production deployments. First, they lack multi-tenancy support, failing to handle stacked homogeneous or heterogeneous workloads. Second, limited control-plane flexibility leads to fairness violations and performance variability. Finally, insufficient observability prevents operators from diagnosing performance pathologies at scale. We present Equilibria, an OS framework enabling fair, multi-tenant CXL tiering at datacenter scale. Equilibria provides per-container controls for memory fair-share allocation and fine-grained observability of tiered-memory usage and operations. It further enforces flexible, user-specified fairness policies through regulated promotion and demotion, and mitigates noisy-neighbor interference by suppressing thrashing. Evaluated in a large hyperscaler fleet using production workloads and benchmarks, Equilibria helps workloads meet service level objectives (SLOs) while avoiding performance interference. It improves performance over the state-of-the-art Linux solution, TPP, by up to 52% for production workloads and 1.7x for benchmarks. All Equilibria patches have been released to the Linux community.
CRDec 19, 2021
An Architecture for Exploiting Native User-Land Checkpoint-Restart to Improve FuzzingPrashant Singh Chouhan, Gregory Price, Gene Cooperman
Fuzzing is one of the most popular and widely used techniques to find vulnerabilities in any application. Fuzzers are fast enough, but they still spend a good portion of time to restart a crashed application and then fuzz it from the beginning. Fuzzing an application from a point deeper in the execution is also important. To do this, a user needs to take a snapshot of the program while fuzzing it on top of an emulator, virtual machine, or by utilizing a special kernel module to enable checkpointing. Even with this ability, it can be difficult to attach a fuzzer after restoring a checkpoint. As a result, most fuzzers leverage a form of fork-server design. We propose a novel testing architecture that allows users to attach a fuzzer after the program has started running. We do this by natively checkpointing the target application at a point of interest, and attaching the fuzzer after restoring the checkpoint. A fork-server may even be engaged at the point of restoration. This not only improves the throughput of the fuzzing campaign by minimizing startup time, but opens up a new way to fuzz applications. With this architecture, a user can take a series of checkpoints at points of interest, and run parallel tests to reduce the overall state-complexity of an individual test. Checkpoints allow us to begin fuzzing from a deeper point in the execution path, omitting prior execution from the required coverage path. This and other checkpointing techniques are described in the paper to help improve fuzzing.