IRApr 13

KScaNN: Scalable Approximate Nearest Neighbor Search on Kunpeng

arXiv:2511.0329828.3h-index: 4
Predicted impact top 93% in IR · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the critical need for high-performance ANNS on ARM servers, which are increasingly adopted in industry, by providing a definitive blueprint for achieving leadership-class performance on modern ARM architectures.

KScaNN introduces a novel ANNS algorithm co-designed for ARM Kunpeng 920, achieving up to 1.63x speedup over the fastest x86-based solution, closing the performance gap and setting a new standard for ARM-based vector search.

Approximate Nearest Neighbor Search (ANNS) is a cornerstone algorithm for information retrieval, recommendation systems, and machine learning applications. While x86-based architectures have historically dominated this domain, the increasing adoption of ARM-based servers in industry presents a critical need for ANNS solutions optimized on ARM architectures. A naive port of existing x86 ANNS algorithms to ARM platforms results in a substantial performance deficit, failing to leverage the unique capabilities of the underlying hardware. To address this challenge, we introduce KScaNN, a novel ANNS algorithm co-designed for the Kunpeng 920 ARM architecture. KScaNN embodies a holistic approach that synergizes sophisticated, data aware algorithmic refinements with carefully-designed hardware specific optimizations. Its core contributions include: 1) novel algorithmic techniques, including a hybrid intra-cluster search strategy and an improved PQ residual calculation method, which optimize the search process at a higher level; 2) an ML-driven adaptive search module that provides adaptive, per-query tuning of search parameters, eliminating the inefficiencies of static configurations; and 3) highly-optimized SIMD kernels for ARM that maximize hardware utilization for the critical distance computation workloads. The experimental results demonstrate that KScaNN not only closes the performance gap but establishes a new standard, achieving up to a 1.63x speedup over the fastest x86-based solution. This work provides a definitive blueprint for achieving leadership-class performance for vector search on modern ARM architectures and underscores

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes