DSDBITITMar 15

Sublime: Sublinear Error & Space for Unbounded Skewed Streams

arXiv:2603.1419033.3h-index: 17
AI Analysis

This addresses a critical bottleneck in stream processing systems for applications requiring real-time frequency tracking, offering a novel solution to enhance efficiency and scalability.

The paper tackles the problem of memory inefficiency and error scaling in frequency estimation sketches for data streams by introducing Sublime, a framework that dynamically adapts to stream skew and length, resulting in significant improvements in accuracy and memory usage over state-of-the-art methods.

Modern stream processing systems must often track the frequency of distinct keys in a data stream in real-time. Since monitoring the exact counts often entails a prohibitive memory footprint, many applications rely on compact, probabilistic data structures called frequency estimation sketches to approximate them. However, mainstream frequency estimation sketches fall short in two critical aspects: (1) They are memory-inefficient under data skew. This is because they use uniformly-sized counters to track the key counts and thus waste memory on storing the leading zeros of many small counter values. (2) Their estimation error deteriorates at least linearly with the stream's length, which may grow indefinitely over time. This is because they count the keys using a fixed number~of~counters. We present Sublime, a framework that generalizes frequency estimation sketches to address these problems by dynamically adapting to the stream's skew and length. To save memory under skew, Sublime uses short counters upfront and elongates them with extensions stored within the same cache line as they overflow. It leverages novel bit manipulation routines to quickly access a counter's extension. It also controls the scaling of its error rate by expanding its number of approximate counters as the stream grows. We apply Sublime to Count-Min Sketch and Count Sketch. We show, theoretically and empirically, that Sublime significantly improves accuracy and memory over the state of the art while maintaining competitive or superior performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes