Malleable Molecular Dynamics Simulations with GROMACS and DMR
For HPC users running molecular dynamics simulations, this work addresses resource inefficiency from static allocations by enabling malleability, though it is an incremental integration of existing middleware.
The authors integrated DMR middleware into GROMACS to enable malleable MPI process counts, achieving dynamic adaptation for bursty workloads. On MareNostrum 5, they demonstrated node-hour savings and reduced idle resources compared to static allocations.
Static resource allocations in high-performance computing (HPC) lead to inefficiencies for time-varying workloads, causing idle resources, queue delays, and higher node-hour costs. The Dynamic Management of Resources (DMR) middleware enables MPI process malleability in Slurm via a simple API decoupled from scheduler internals. In this work, we integrate DMR into the GROMACS molecular dynamics engine to obtain a malleable variant that can dynamically adapt its MPI process count by combining communication-efficiency-aware reconfiguration with GROMACS' native checkpoint/restart mechanism. We evaluate this design on the MareNostrum~5 supercomputer, comparing dynamic runs against static executions and quantifying reconfiguration overheads, time-to-solution, and node-hour savings for bursty GROMACS workloads.