CEDCLGAO-PHApr 13, 2021

Using Machine Learning at Scale in HPC Simulations with SmartSim: An Application to Ocean Climate Modeling

arXiv:2104.09355v123 citations
AI Analysis

This work enables climate scientists to incorporate machine learning into traditional HPC simulations at scale, though it is incremental as it applies existing methods to a new domain.

The authors tackled the challenge of integrating distributed, online deep neural network inference into high-performance computing (HPC) simulations for ocean climate modeling, demonstrating stable, large-scale ensemble simulations with 970 billion inferences and minimal runtime impact.

We demonstrate the first climate-scale, numerical ocean simulations improved through distributed, online inference of Deep Neural Networks (DNN) using SmartSim. SmartSim is a library dedicated to enabling online analysis and Machine Learning (ML) for traditional HPC simulations. In this paper, we detail the SmartSim architecture and provide benchmarks including online inference with a shared ML model on heterogeneous HPC systems. We demonstrate the capability of SmartSim by using it to run a 12-member ensemble of global-scale, high-resolution ocean simulations, each spanning 19 compute nodes, all communicating with the same ML architecture at each simulation timestep. In total, 970 billion inferences are collectively served by running the ensemble for a total of 120 simulated years. Finally, we show our solution is stable over the full duration of the model integrations, and that the inclusion of machine learning has minimal impact on the simulation runtimes.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes