Reclaiming Idle CPU Cycles on Kubernetes: Sparse-Domain Multiplexing for Concurrent MPI-CFD Simulations
This addresses inefficiencies in high-performance computing for researchers and engineers using shared Kubernetes clusters, though it is incremental as it builds on existing scheduling and profiling techniques.
The paper tackles the problem of idle CPU cycles during MPI-parallel simulations on Kubernetes clusters by introducing a multiplexing framework that co-locates multiple simulations to reclaim capacity, achieving up to 3.74x throughput scaling with minimal overhead.
When MPI-parallel simulations run on shared Kubernetes clusters, conventional CPU scheduling leaves the vast majority of provisioned cycles idle at synchronization barriers. This paper presents a multiplexing framework that reclaims this idle capacity by co-locating multiple simulations on the same cluster. PMPI-based duty-cycle profiling quantifies the per-rank idle fraction; proportional CPU allocation then allows a second simulation to execute concurrently with minimal overhead, yielding 1.77x throughput. A Pareto sweep to N=5 concurrent simulations shows throughput scaling to 3.74x, with a knee at N=3 offering the best efficiency-cost trade-off. An analytical model with a single fitted parameter predicts these gains within +/-4%. A dynamic controller automates the full pipeline, from profiling through In-Place Pod Vertical Scaling (KEP-1287) to packing and fairness monitoring, achieving 3.25x throughput for four simulations without manual intervention or pod restarts. To our knowledge, this is the first CPU application of In-Place Pod Vertical Scaling to running MPI processes. Experiments on an AWS cluster with OpenFOAM CFD confirm that the results hold under both concentric and standard graph-based (Scotch) mesh partitioning.