SOLANET: Distributed Neighbor Graph Construction on GPU-Accelerated Systems
This work addresses the challenge of scaling neighbor graph construction to large datasets on distributed GPU systems, which is important for data analytics and AI workloads.
SOLANET is a GPU-accelerated distributed neighbor graph construction toolkit that achieves 11x speedup scaling from 32 to 512 APUs for 1 billion data points and 6.9x speedup from 64 to 512 APUs for 2 billion points, outperforming a state-of-the-art single-GPU implementation.
Neighbor graphs capture relationships among data points and are widely used in data analytics and AI workloads. Many studies have explored approximate construction methods for single-node systems, including GPUs. However, extending this to distributed systems for larger data and further acceleration remains challenging due to irregular computation patterns. We present SOLANET, a GPU-accelerated distributed neighbor graph construction toolkit. SOLANET first constructs local graphs on each GPU after data partitioning and then refines them via approximate nearest neighbor (ANN) searches over remote graphs pulled from other GPUs using MPI one-sided operations. SOLANET also provides a lock-free single-GPU neighbor graph construction algorithm for AMD GPUs. Our single-GPU implementation outperforms a state-of-the-art GPU-based approximate neighbor graph construction implementation across multiple datasets on a single MI300A APU. Furthermore, SOLANET demonstrates 11X speedup from 32 to 512 APUs for 1 billion data points and 6.9x speedup from 64 to 512 APUs for 2 billion points.