ITITMar 16

Space Upper Bounds for $α$-Perfect Hashing

arXiv:2603.1525114.5h-index: 3
Predicted impact top 77% in IT · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses space efficiency in approximate hashing for data structures, but it is incremental as it extends known minimal perfect hashing results to an approximate setting.

The paper tackles the problem of constructing minimal α-perfect hash functions, which allow at most (1-α)k collisions in expectation, and proposes schemes that improve space requirements over a baseline, with specific space bounds analyzed.

In the problem of perfect hashing, we are given a size $k$ subset $\mathcal{A}$ of a universe of keys $[n] = \{1,2, \cdots, n\}$, for which we wish to construct a hash function $h: [n] \to [b]$ such that $h(\cdot)$ maps $\mathcal{A}$ to $[b]$ with no collisions, i.e., the restriction of $h(\cdot)$ to $\mathcal{A}$ is injective. When $b=k$, the problem is referred to as minimal perfect hashing. In this paper, we extend the study of minimal perfect hashing to the approximate setting. For some $α\in [0, 1]$, we say that a randomized hashing scheme is $α$-perfect if for any input $\mathcal{A}$ of size $k$, it outputs a hash function which exhibits at most $(1-α)k$ collisions on $\mathcal{A}$ in expectation. One important performance consideration for any hashing scheme is the space required to store the hash functions. For minimal perfect hashing, i.e., $b = k$, it is well known that approximately $k\log(e)$ bits, or $\log(e)$ bits per key, is required to store the hash function. In this paper, we propose schemes for constructing minimal $α$-perfect hash functions and analyze their space requirements. We begin by presenting a simple base-line scheme which randomizes between perfect hashing and zero-bit random hashing. We then present a more sophisticated hashing scheme based on sampling which significantly improves upon the space requirement of the aforementioned strategy for all values of $α$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes