DSMar 11

Huffman-Bucket Sketch: A Simple $O(m)$ Algorithm for Cardinality Estimation

arXiv:2603.10930v111.0h-index: 8
Predicted impact top 31% in DS · last 90 daysOriginality Incremental advance
AI Analysis

This addresses memory efficiency for large-scale data streaming applications, though it appears incremental as it builds on existing HLL methods.

The paper tackles the problem of high memory usage in HyperLogLog sketches for cardinality estimation by introducing the Huffman-Bucket Sketch, which compresses an HLL sketch to optimal space O(m+log n) bits with amortized constant-time updates, reducing memory requirements while retaining mergeability.

We introduce the Huffman-Bucket Sketch (HBS), a simple, mergeable data structure that losslessly compresses a HyperLogLog (HLL) sketch with $m$ registers to optimal space $O(m+\log n)$ bits, with amortized constant-time updates, acting as a drop-in replacement for HLL that retains mergeability and substantially reduces memory requirements. We partition registers into small buckets and encode their values with a global Huffman codebook derived from the strongly concentrated HLL rank distribution, using the current cardinality estimate for determining the mode of the distribution. We prove that the Huffman tree needs rebuilding only $O(\log n)$ times over a stream, roughly when cardinality doubles. The framework can be extended to other sketches with similar strongly concentrated distributions. We provide preliminary numerical evidence that suggests that HBS is practical and can potentially be competitive with state-of-the-art in practice.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes