DBDCIRMar 17

HierarchicalKV: A GPU Hash Table with Cache Semantics for Continuous Online Embedding Storage

arXiv:2603.1716886.1h-index: 11Has Code
AI Analysis

This addresses memory inefficiency in GPU-based systems for large-scale recommendation and embedding storage, offering a practical solution with measurable performance gains.

The paper tackles the problem of GPU hash tables wasting memory by preserving all inserted keys, introducing HierarchicalKV (HKV) with cache semantics for eviction to handle embedding tables exceeding GPU capacity, achieving up to 3.9 billion key-value pairs per second find throughput and 1.4x higher throughput than the best baseline.

Traditional GPU hash tables preserve every inserted key -- a dictionary assumption that wastes scarce High Bandwidth Memory (HBM) when embedding tables routinely exceed single-GPU capacity. We challenge this assumption with cache semantics, where policy-driven eviction is a first-class operation. We introduce HierarchicalKV (HKV), the first general-purpose GPU hash table library whose normal full-capacity operating contract is cache-semantic: each full-bucket upsert (update-or-insert) is resolved in place by eviction or admission rejection rather than by rehashing or capacity-induced failure. HKV co-designs four core mechanisms -- cache-line-aligned buckets, in-line score-driven upsert, score-based dynamic dual-bucket selection, and triple-group concurrency -- and uses tiered key-value separation as a scaling enabler beyond HBM. On an NVIDIA H100 NVL GPU, HKV achieves up to 3.9 billion key-value pairs per second (B-KV/s) find throughput, stable across load factors 0.50-1.00 (<5% variation), and delivers 1.4x higher find throughput than WarpCore (the strongest dictionary-semantic GPU baseline at lambda=0.50) and up to 2.6-9.4x over indirection-based GPU baselines. Since its open-source release in October 2022, HKV has been integrated into multiple open-source recommendation frameworks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes