PLApr 19

Partitioning Unstructured Sparse Tensor Algebra for Load-Balanced Parallel Execution

Atharva Chougule, Alexander J Root, Rubens Lacouture, Bobby Yan, Rohan Yadav, Fredrik Kjolstad

arXiv:2604.1719843.2h-index: 7

Predicted impact top 30% in PL · last 90 daysOriginality Highly original

AI Analysis

For developers of sparse tensor algebra compilers, this work provides a provably load-balanced parallelization method that automates efficient code generation for irregular computations.

This paper presents the first partitioning algorithm that provably load balances any sparse tensor algebra expression across parallel execution units, generalizing parallel merging to multi-dimensional sparse data. The generated code is competitive with vendor libraries (0.73–3.4× geometric mean) and outperforms general-purpose strategies (2.0–6.4× geometric mean).

Sparse tensor algebra is challenging to efficiently parallelize due to the irregular, data-dependent, and potentially skewed structure of sparse computation. We propose the first partitioning algorithm that provably load balances the computation of any sparse tensor algebra expression across parallel execution units. Our algorithm generalizes parallel merging algorithms to any number of operands, and to multi-dimensional, hierarchical sparse data structures. We implement our algorithm within an existing sparse tensor algebra compilation framework to automatically generate parallel sparse tensor algebra kernels that target multi-core CPUs and GPUs. We show that our generated code is competitive with hand-implemented parallelization strategies used by vendor libraries like Intel MKL and NVIDIA cuSPARSE (geo-means of $0.73$--$3.4\times$) and \textsc{Taco} (geo-means of $1.0$--$2.4\times$), and significantly outperforms general-purpose strategies for sparse tensor expressions where specialized algorithms have not been developed (geo-means of $2.0$--$6.4\times$).

View on arXiv PDF

Similar