39.2DCMar 24
Communication-Aware Diffusion Load Balancing for Persistently Interacting ObjectsMaya Taylor, Kavitha Chandrasekar, Laxmikant V. Kale
Parallel applications with irregular and time-varying workloads often suffer from load imbalance. Dynamic load balancing techniques address this challenge by redistributing work during execution. We present a new type of distributed diffusion-based load balancing targeted at communication-intensive applications with persistently communicating objects. Leveraging the application's communication graph, our strategy reduces across-node communication while simultaneously distributing load effectively. We also propose an algorithmic variant for cases where the communication patterns are not readily available. We explore optimizations to our algorithm, and comparisons with other related load balancing strategies in simulation and on a Particle-in-Cell benchmark on up to 8 nodes of Perlmutter at NERSC.
11.3PFMar 24
Numerical Kernels on a Spatial Accelerator: A Study of Tenstorrent WormholeMaya Taylor, Carl Pearson, Luc Berger-Vergiat et al.
As AI accelerators gain prominence, their potential for traditional scientific computing workloads remains unclear. This paper explores Tenstorrent's Wormhole architecture, a spatial computing platform designed for neural network acceleration, by implementing three numerical kernels and composing them into a conjugate gradient solver. We present architecture-specific optimizations for sparse numerical algorithms, evaluate their performance against Nvidia GPUs, and expose both challenges and opportunities in porting numerical methods to spatial architectures. Our results demonstrate that AI accelerators merit consideration for workloads traditionally dominated by CPUs and GPUs, and more work should be invested in understanding the capabilities of these architectures and making them accessible to the scientific computing community.