DSMar 11

Huffman-Bucket Sketch: A Simple $O(m)$ Algorithm for Cardinality Estimation

arXiv:2603.10930v17.9h-index: 6

Predicted impact top 31% in DS · last 90 daysOriginality Incremental advance

AI Analysis

This addresses memory efficiency for large-scale data streaming applications, though it appears incremental as it builds on existing HLL methods.

The paper tackles the problem of high memory usage in HyperLogLog sketches for cardinality estimation by introducing the Huffman-Bucket Sketch, which compresses an HLL sketch to optimal space O(m+log n) bits with amortized constant-time updates, reducing memory requirements while retaining mergeability.

We introduce the Huffman-Bucket Sketch (HBS), a simple, mergeable data structure that losslessly compresses a HyperLogLog (HLL) sketch with $m$ registers to optimal space $O(m+\log n)$ bits, with amortized constant-time updates, acting as a drop-in replacement for HLL that retains mergeability and substantially reduces memory requirements. We partition registers into small buckets and encode their values with a global Huffman codebook derived from the strongly concentrated HLL rank distribution, using the current cardinality estimate for determining the mode of the distribution. We prove that the Huffman tree needs rebuilding only $O(\log n)$ times over a stream, roughly when cardinality doubles. The framework can be extended to other sketches with similar strongly concentrated distributions. We provide preliminary numerical evidence that suggests that HBS is practical and can potentially be competitive with state-of-the-art in practice.

View on arXiv PDF

Similar