DCMay 19

Hypergraph Partitioning on GPU with Distinct Incident Hyperedges and Size Constraints

arXiv:2605.2049722.3

Predicted impact top 65% in DC · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the need for efficient hypergraph partitioning on GPUs for engineering applications with specific constraints, offering significant speedups and quality improvements over existing methods.

The paper introduces a GPU-centric algorithm for multi-level hypergraph partitioning under constraints of limited size and distinct inbound hyperedges, achieving an average 380x speedup and 1.2-2.0x reduction in connectivity over a sequential partitioner, and for k-way balanced partitioning, it runs 5x faster than CPU methods with ~5% quality loss for k=2.

Hypergraph partitioning is a recurring NP-hard problem in engineering; its efficient solution at scale hinges on parallelism. This work proposes a GPU-centric algorithm for multi-level hypergraph partitioning aimed at a specific set of problem constraints: limited size and distinct inbound hyperedges per partition. Manipulating hypergraphs requires deeply nested traversals and concurrent decision-making; our constraints impose further set operations amidst that. In turn, we design algorithms around the GPU's hierarchical parallelism and our problem's specifics. When forming partitions, we materialize the hypergraph's incidence structure and unique neighborhoods in memory to exploit set sparsity and batch node-pairing scores in shared memory. Upon refining partitions, we chain node moves into improving paths and cycles, checking their validity via cumulative set size variations reduced in parallel over moves. Thus, our dominant kernels exhibit a span linear in local hypergraph parameters. Results show an average 380x speedup and a 1.2-2.0x reduction in connectivity compared to a sequential multi-level partitioner. With minor changes, we also support k-way balanced partitioning, running 5x faster than CPU methods with a ~5% quality loss for k=2, outperforming an existing GPU partitioner at comparable runtime, with no measurable overhead from the added constraints handling logic.

View on arXiv PDF

Similar