LGARNov 6, 2025

SLOFetch: Compressed-Hierarchical Instruction Prefetching for Cloud Microservices

arXiv:2511.04774v24 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses latency and energy issues in cloud workloads, but it is incremental as it builds on existing prefetching techniques.

The paper tackles the problem of frontend stalls in cloud microservices by introducing SLOFetch, a compressed-hierarchical instruction prefetching design that reduces on-chip state while maintaining performance improvements similar to prior methods.

Large-scale networked services rely on deep soft-ware stacks and microservice orchestration, which increase instruction footprints and create frontend stalls that inflate tail latency and energy. We revisit instruction prefetching for these cloud workloads and present a design that aligns with SLO driven and self optimizing systems. Building on the Entangling Instruction Prefetcher (EIP), we introduce a Compressed Entry that captures up to eight destinations around a base using 36 bits by exploiting spatial clustering, and a Hierarchical Metadata Storage scheme that keeps only L1 resident and frequently queried entries on chip while virtualizing bulk metadata into lower levels. We further add a lightweight Online ML Controller that scores prefetch profitability using context features and a bandit adjusted threshold. On data center applications, our approach preserves EIP like speedups with smaller on chip state and improves efficiency for networked services in the ML era.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes