DSMay 18

A Note on Second-Order Expected Maximum-Load Bounds for Binary Linear Hashing

arXiv:2605.1833542.5
Predicted impact top 26% in DS · last 90 daysOriginality Synthesis-oriented
AI Analysis

For theoretical computer scientists studying hash functions, this shows binary linear hashing achieves nearly optimal load balancing, matching fully independent hashing to second order.

This paper refines the analysis of maximum load in binary linear hashing, proving that the expected maximum load is (1+o(1)) log n / log log n, matching fully independent hashing in the leading term and second-order correction up to a 1+o(1) factor. The tail probability is improved to O((log log n)^2 / (R^2 (log n)^{2-2/R})).

Let $S\subseteq F_2^u$ have size $n=2^\ell$, and let $h:F_2^u\to F_2^\ell$ be a uniformly random linear map. For $y\in F_2^\ell$, write $Load_h(y):=|h^{-1}(y)\cap S|$, and let $M(S,h):=\max_{y\in F_2^\ell} Load_h(y)$ be the maximum load. Jaber, Kumar and Zuckerman (STOC 2025) proved that the expected maximum load of $h$ on $S$ is at most $16\log n/\log\log n$, matching the fully independent keys-into-bins scale up to constants. Their proof also gives the tail estimate \[ \Pr\left[ M(S,h)\ge R\frac{\log n}{\log\log n} \right] \le O\left(\frac{1}{R^{2}}\right). \] We record a base optimization in their exponential-potential method showing that binary linear hashing nearly matches fully independent hashing also at the level of the second-order maximum-load scale. For every $R>1$ satisfying $R\ell^{1-1/R}\ge D\ln\ell$, where $D$ is an absolute constant, we prove \[ \Pr\left[ M(S,h)\ge R\frac{\log n}{\log\log n} \right] \le O\left( \frac{(\log\log n)^2}{R^2(\log n)^{2-2/R}} \right). \] Integrating this tail yields \[ E[M(S,h)] \le \left( 1+ (1+o(1)) \frac{\log\log\log n}{\log\log n} \right) \frac{\log n}{\log\log n}. \] Thus binary linear hashing matches fully independent hashing in the leading term and matches the dominant second-order correction up to a $1+o(1)$ factor. We also prove, by an independent self-contained argument, a sharp tail bound for one prescribed bucket: for fixed $y\in F_2^\ell$, \[ \Pr[ Load_h(y)>2^a-2]\le γ^{-1}2^{-a^2}, \] where $ γ=\prod_{j\ge1}(1-2^{-j}) $. A subspace construction shows that this is asymptotically tight even in the leading constant as $ a\to\infty $. However, this controls only a fixed bucket; a direct union bound over all buckets loses a factor $ 2^\ell $.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes