SEApr 18

Gleaner: A Semantically-Rich and Efficient Online Sampler for Microservice Diagnostics

arXiv:2604.1681045.3h-index: 2
Predicted impact top 57% in SE · last 90 daysOriginality Highly original
AI Analysis

For operators of microservice systems, Gleaner enables high-fidelity online sampling for diagnostics, transforming sampling from data reduction to signal enhancement.

Gleaner is an online tail-sampling framework for microservice diagnostics that replaces expensive graph analysis with efficient set-based operations on bag-of-edges representations augmented with log semantics. It achieves 0.74ms per trace, improves Trace Pattern Coverage by up to 128.7% and Shannon Entropy by up to 32.9% over baselines, and at a 1% sampling rate improves RCA accuracy by 42%-107% over the next-best sampler, even surpassing the accuracy of using the full unsampled dataset.

Distributed tracing in microservices is critical for diagnostics but generates overwhelming data volumes, necessitating intelligent sampling. To maximize fidelity, state-of-the-art (SOTA) tail-based samplers analyze complete (or even log-enriched) traces by modeling them as graphs. However, this reliance on computationally expensive graph analysis creates a performance bottleneck that prohibits their use in online settings. To this end, we propose Gleaner, an online tail-sampling framework that breaks this trade-off. It is founded on the key insight that explicit graph structures are unnecessary for high-fidelity trace grouping. Instead, Gleaner represents each trace as a "bag-of-edges" augmented with log semantics, replacing slow graph algorithms with highly efficient set-based operations. It also employs an alarm-driven quota and a diversity-preserving strategy to prioritize anomalous and rare traces for downstream Root Cause Analysis (RCA). Experimentally, Gleaner processes traces at 0.74ms each, improving Trace Pattern Coverage by up to 128.7% and Shannon Entropy by up to 32.9% over baselines. At just a 1% sampling rate, Gleaner improves RCA accuracy by 42%-107% over the next-best sampler. Moreover, RCA on Gleaner's sampled data is more accurate than with the entire, unsampled dataset. This result reframes intelligent sampling from a data reduction technique to a powerful signal enhancement paradigm for automated operations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes