Elastic Sketch under Random Stationary Streams: Limiting Behavior and Near-Optimal Configuration
This work provides a theoretical foundation for tuning Elastic-Sketch parameters to improve memory-accuracy trade-offs in streaming data applications, though it is incremental as it builds on existing methods.
The paper tackled the problem of optimizing Elastic-Sketch, a data structure for counting item appearances in streams, by analyzing its performance under a stationary random stream model to derive closed-form expressions for limiting error distributions, enabling efficient parameter tuning and reducing the search space for optimal configurations.
\texttt{Elastic-Sketch} is a hash-based data structure for counting item's appearances in a data stream, and it has been empirically shown to achieve a better memory-accuracy trade-off compared to classical methods. This algorithm combines a \textit{heavy block}, which aims to maintain exact counts for a small set of dynamically \textit{elected} items, with a light block that implements \texttt{Count-Min} \texttt{Sketch} (\texttt{CM}) for summarizing the remaining traffic. The heavy block dynamics are governed by a hash function~$β$ that hashes items into~$m_1$ buckets, and an \textit{eviction threshold}~$λ$, which controls how easily an elected item can be replaced. We show that the performance of \texttt{Elastic-Sketch} strongly depends on the stream characteristics and the choice of~$λ$. Since optimal parameter choices depend on unknown stream properties, we analyze \texttt{Elastic-Sketch} under a \textit{stationary random stream} model -- a common assumption that captures the statistical regularities observed in real workloads. Formally, as the stream length goes to infinity, we derive closed-form expressions for the limiting distribution of the counters and the resulting expected counting error. These expressions are efficiently computable, enabling practical grid-based tuning of the heavy and \texttt{CM} blocks memory split (via $m_1$) and the eviction threshold~$λ$. We further characterize the structure of the optimal eviction threshold, substantially reducing the search space and showing how this threshold depends on the arrival distribution. Extensive numerical simulations validate our asymptotic results on finite streams from the Zipf distribution.