Lucas Meyer

h-index3

5papers

139citations

Novelty45%

AI Score42

Ranked #58,563 of 194,257 authors (top 30%)#13,309 in LG (top 33%)

5 Papers

3.8LGSep 28, 2023Code

High Throughput Training of Deep Surrogates from Large Ensemble Runs

Lucas Meyer, Marc Schouler, Robert Alexander Caulk et al.

Recent years have seen a surge in deep learning approaches to accelerate numerical solvers, which provide faithful but computationally intensive simulations of the physical world. These deep surrogates are generally trained in a supervised manner from limited amounts of data slowly generated by the same solver they intend to accelerate. We propose an open-source framework that enables the online training of these models from a large ensemble run of simulations. It leverages multiple levels of parallelism to generate rich datasets. The framework avoids I/O bottlenecks and storage issues by directly streaming the generated data. A training reservoir mitigates the inherent bias of streaming while maximizing GPU throughput. Experiment on training a fully connected network as a surrogate for the heat equation shows the proposed approach enables training on 8TB of data in 2 hours with an accuracy improved by 47% and a batch throughput multiplied by 13 compared to a traditional offline procedure.

2.5AINov 8, 2022

Simulation-Based Parallel Training

Lucas Meyer, Alejandro Ribés, Bruno Raffin

Numerical simulations are ubiquitous in science and engineering. Machine learning for science investigates how artificial neural architectures can learn from these simulations to speed up scientific discovery and engineering processes. Most of these architectures are trained in a supervised manner. They require tremendous amounts of data from simulations that are slow to generate and memory greedy. In this article, we present our ongoing work to design a training framework that alleviates those bottlenecks. It generates data in parallel with the training process. Such simultaneity induces a bias in the data available during the training. We present a strategy to mitigate this bias with a memory buffer. We test our framework on the multi-parametric Lorenz's attractor. We show the benefit of our framework compared to offline training and the success of our data bias mitigation strategy to capture the complex chaotic dynamics of the system.

37.3LGNov 30, 2024Code

The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning

Ruben Ohana, Michael McCabe, Lucas Meyer et al. · cambridge

Machine learning based surrogate models offer researchers powerful tools for accelerating simulation-based workflows. However, as standard datasets in this space often cover small classes of physical behavior, it can be difficult to evaluate the efficacy of new approaches. To address this gap, we introduce the Well: a large-scale collection of datasets containing numerical simulations of a wide variety of spatiotemporal physical systems. The Well draws from domain experts and numerical software developers to provide 15TB of data across 16 datasets covering diverse domains such as biological systems, fluid dynamics, acoustic scattering, as well as magneto-hydrodynamic simulations of extra-galactic fluids or supernova explosions. These datasets can be used individually or as part of a broader benchmark suite. To facilitate usage of the Well, we provide a unified PyTorch interface for training and evaluating models. We demonstrate the function of this library by introducing example baselines that highlight the new challenges posed by the complex dynamics of the Well. The code and data is available at https://github.com/PolymathicAI/the_well.

2.3IMOct 20, 2025

Universal Spectral Tokenization via Self-Supervised Panchromatic Representation Learning

Jeff Shen, Francois Lanusse, Liam Holden Parker et al. · cambridge

Sequential scientific data span many resolutions and domains, and unifying them into a common representation is a key step toward developing foundation models for the sciences. Astronomical spectra exemplify this challenge: massive surveys have collected millions of spectra across a wide range of wavelengths and resolutions, yet analyses remain fragmented across spectral domains (e.g., optical vs. infrared) and object types (e.g., stars vs. galaxies), limiting the ability to pool information across datasets. We present a deep learning model that jointly learns from heterogeneous spectra in a self-supervised manner. Our universal spectral tokenizer processes spectra from a variety of object types and resolutions directly on their native wavelength grids, producing intrinsically aligned, homogeneous, and physically meaningful representations that can be efficiently adapted to achieve competitive performance across a range of downstream tasks. For the first time, we demonstrate that a single model can unify spectral data across resolutions and domains, suggesting that our model can serve as a powerful building block for foundation models in astronomy -- and potentially extend to other scientific domains with heterogeneous sequential data, such as climate and healthcare.

3.1LGDec 16, 2021

Deep Surrogate for Direct Time Fluid Dynamics

Lucas Meyer, Louen Pottier, Alejandro Ribes et al.

The ubiquity of fluids in the physical world explains the need to accurately simulate their dynamics for many scientific and engineering applications. Traditionally, well established but resource intensive CFD solvers provide such simulations. The recent years have seen a surge of deep learning surrogate models substituting these solvers to alleviate the simulation process. Some approaches to build data-driven surrogates mimic the solver iterative process. They infer the next state of the fluid given its previous one. Others directly infer the state from time input. Approaches also differ in their management of the spatial information. Graph Neural Networks (GNN) can address the specificity of the irregular meshes commonly used in CFD simulations. In this article, we present our ongoing work to design a novel direct time GNN architecture for irregular meshes. It consists of a succession of graphs of increasing size connected by spline convolutions. We test our architecture on the Von K{á}rm{á}n's vortex street benchmark. It achieves small generalization errors while mitigating error accumulation along the trajectory.