DCPFMar 24

Communication-Aware Diffusion Load Balancing for Persistently Interacting Objects

arXiv:2603.2332939.2h-index: 60
AI Analysis

This addresses load balancing for communication-intensive applications with persistently interacting objects, offering a domain-specific incremental improvement.

The paper tackles load imbalance in parallel applications with irregular, time-varying workloads by introducing a distributed diffusion-based load balancing method that leverages communication graphs to reduce cross-node communication while effectively distributing load, achieving improvements in a Particle-in-Cell benchmark on up to 8 nodes.

Parallel applications with irregular and time-varying workloads often suffer from load imbalance. Dynamic load balancing techniques address this challenge by redistributing work during execution. We present a new type of distributed diffusion-based load balancing targeted at communication-intensive applications with persistently communicating objects. Leveraging the application's communication graph, our strategy reduces across-node communication while simultaneously distributing load effectively. We also propose an algorithmic variant for cases where the communication patterns are not readily available. We explore optimizations to our algorithm, and comparisons with other related load balancing strategies in simulation and on a Particle-in-Cell benchmark on up to 8 nodes of Perlmutter at NERSC.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes