NIApr 30

Libra: Accelerating Socket I/O via Programmable Selective Data Copying

arXiv:2604.2768684.5
AI Analysis

For cloud-native systems using unmodified L7 proxies, Libra addresses the bottleneck of full payload copying without breaking compatibility.

Libra reduces kernel-user data copying for L7 proxies by copying only metadata to user space and retaining payloads in the kernel, improving plaintext throughput by up to 4.2x and reducing P99 tail latency by over 90%.

Layer-7 (L7) proxies are critical to modern cloud-native systems, yet their performance is increasingly bottlenecked by copying entire payloads across the kernel-user boundary. Existing approaches reduce this overhead but typically sacrifice compatibility with unmodified POSIX applications, introduce new APIs, or require specialized environments. We show that, under conventional OS abstractions, fully eliminating kernel-user copies while preserving standard socket semantics for unmodified proxies is fundamentally impossible. This leads to a practical insight: in common L7 workloads, proxies inspect only small metadata (e.g., HTTP headers) for routing, while forwarding the bulk payload unchanged. Based on this insight, we present Libra, an OS-level selective-copy framework that copies only metadata to the user space and retains the bulk payload in the kernel for forwarding, reducing data movement without breaking compatibility. Libra uses eBPF to identify protocol-specific metadata boundaries and coordinate selective copy and payload reuse across receive and transmit paths, all without modifying the socket API. Implemented in Linux and evaluated with unmodified Nginx and HAProxy, Libra improves plaintext throughput by up to 4.2x and reduces P99 tail latency by over 90%. With hardware-offloaded kTLS, it boosts encrypted throughput by 2.0x and cuts tail latency by 65%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes