DCAPMay 14

Polynomial Histograms for Memory-Efficient Representation of Long-tailed System Distributions

arXiv:2605.3036060.6h-index: 7
AI Analysis

This work provides a more memory-efficient method for representing performance metric distributions, which is crucial for engineers and operators managing large-scale distributed systems.

This paper addresses the challenge of efficiently representing long-tailed system distributions in distributed systems by introducing polynomial histograms. It proposes an information loss metric for binned data and demonstrates that polynomial histograms, which annotate each bin with moments of the underlying distribution, can offer more information at a lower storage cost compared to traditional histograms for file system metrics in a large production system.

Distributed systems must frequently keep track of many different types of performance metrics across many different computers. For example, the latency distribution of certain operations may be computed for a large combination of computers, users, and operations. These empirical distributions need to be collected at minimal expense on the individual software components, efficiently aggregated across multiple dimensions, and stored in a compact representation for a variety of downstream data analysis applications. We describe an information loss metric for binned data that allows us to optimize cost of information loss from different histogram representations. We explore the use of polynomial histograms where each bin of a histogram is annotated with moments of the underlying distribution in that bin. These polynomial histograms are compared to traditional histograms using the same storage cost for additional bins instead of annotations in each bin. We describe an application of these techniques for file system metrics for a large production system, and analytically characterize when polynomial histograms offer more information at lower cost.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes