AILGNov 8, 2022

Simulation-Based Parallel Training

arXiv:2211.04119v22 citationsh-index: 25
AI Analysis

This addresses data bottlenecks for researchers using ML with scientific simulations, though it appears incremental as it builds on existing parallel training concepts.

The paper tackles the problem of slow and memory-intensive simulation data generation for supervised ML in science by developing a parallel training framework that generates data simultaneously with training, showing successful application to the multi-parametric Lorenz attractor with improved performance over offline training.

Numerical simulations are ubiquitous in science and engineering. Machine learning for science investigates how artificial neural architectures can learn from these simulations to speed up scientific discovery and engineering processes. Most of these architectures are trained in a supervised manner. They require tremendous amounts of data from simulations that are slow to generate and memory greedy. In this article, we present our ongoing work to design a training framework that alleviates those bottlenecks. It generates data in parallel with the training process. Such simultaneity induces a bias in the data available during the training. We present a strategy to mitigate this bias with a memory buffer. We test our framework on the multi-parametric Lorenz's attractor. We show the benefit of our framework compared to offline training and the success of our data bias mitigation strategy to capture the complex chaotic dynamics of the system.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes