Jonah Miller

LG
h-index46
5papers
217citations
Novelty18%
AI Score27

5 Papers

LGNov 30, 2024Code
The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning

Ruben Ohana, Michael McCabe, Lucas Meyer et al. · cambridge

Machine learning based surrogate models offer researchers powerful tools for accelerating simulation-based workflows. However, as standard datasets in this space often cover small classes of physical behavior, it can be difficult to evaluate the efficacy of new approaches. To address this gap, we introduce the Well: a large-scale collection of datasets containing numerical simulations of a wide variety of spatiotemporal physical systems. The Well draws from domain experts and numerical software developers to provide 15TB of data across 16 datasets covering diverse domains such as biological systems, fluid dynamics, acoustic scattering, as well as magneto-hydrodynamic simulations of extra-galactic fluids or supernova explosions. These datasets can be used individually or as part of a broader benchmark suite. To facilitate usage of the Well, we provide a unified PyTorch interface for training and evaluating models. We demonstrate the function of this library by introducing example baselines that highlight the new challenges posed by the complex dynamics of the Well. The code and data is available at https://github.com/PolymathicAI/the_well.

LGNov 22, 2024
What You See is Not What You Get: Neural Partial Differential Equations and The Illusion of Learning

Arvind Mohan, Ashesh Chattopadhyay, Jonah Miller

Differentiable Programming for scientific machine learning (SciML) has recently seen considerable interest and success, as it directly embeds neural networks inside PDEs, often called as NeuralPDEs, derived from first principle physics. Therefore, there is a widespread assumption in the community that NeuralPDEs are more trustworthy and generalizable than black box models. However, like any SciML model, differentiable programming relies predominantly on high-quality PDE simulations as "ground truth" for training. However, mathematics dictates that these are only discrete numerical approximations of the true physics. Therefore, we ask: Are NeuralPDEs and differentiable programming models trained on PDE simulations as physically interpretable as we think? In this work, we rigorously attempt to answer these questions, using established ideas from numerical analysis, experiments, and analysis of model Jacobians. Our study shows that NeuralPDEs learn the artifacts in the simulation training data arising from the discretized Taylor Series truncation error of the spatial derivatives. Additionally, NeuralPDE models are systematically biased, and their generalization capability is likely enabled by a fortuitous interplay of numerical dissipation and truncation error in the training dataset and NeuralPDE, which seldom happens in practical applications. This bias manifests aggressively even in relatively accessible 1-D equations, raising concerns about the veracity of differentiable programming on complex, high-dimensional, real-world PDEs, and in dataset integrity of foundation models. Further, we observe that the initial condition constrains the truncation error in initial-value problems in PDEs, thereby exerting limitations to extrapolation. Finally, we demonstrate that an eigenanalysis of model weights can indicate a priori if the model will be inaccurate for out-of-distribution testing.

GR-QCNov 26, 2019
Enabling real-time multi-messenger astrophysics discoveries with deep learning

E. A. Huerta, Gabrielle Allen, Igor Andreoni et al.

Multi-messenger astrophysics is a fast-growing, interdisciplinary field that combines data, which vary in volume and speed of data processing, from many different instruments that probe the Universe using different cosmic messengers: electromagnetic waves, cosmic rays, gravitational waves and neutrinos. In this Expert Recommendation, we review the key challenges of real-time observations of gravitational wave sources and their electromagnetic and astroparticle counterparts, and make a number of recommendations to maximize their potential for scientific discovery. These recommendations refer to the design of scalable and computationally efficient machine learning algorithms; the cyber-infrastructure to numerically simulate astrophysical sources, and to process and interpret multi-messenger astrophysics data; the management of gravitational wave detections to trigger real-time alerts for electromagnetic and astroparticle follow-ups; a vision to harness future developments of machine learning and cyber-infrastructure resources to cope with the big-data requirements; and the need to build a community of experts to realize the goals of multi-messenger astrophysics.

HCAug 31, 2017
Good Usability Practices in Scientific Software Development

Francisco Queiroz, Raniere Silva, Jonah Miller et al.

Scientific software often presents very particular requirements regarding usability, which is often completely overlooked in this setting. As computational science has emerged as its own discipline, distinct from theoretical and experimental science, it has put new requirements on future scientific software developments. In this paper, we discuss the background of these problems and introduce nine aspects of good usability. We also highlight best practices for each aspect with an emphasis on applications in computational science.

SEMay 7, 2017
Report on the Fourth Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4)

Daniel S. Katz, Kyle E. Niemeyer, Sandra Gesing et al.

This report records and discusses the Fourth Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4). The report includes a description of the keynote presentation of the workshop, the mission and vision statements that were drafted at the workshop and finalized shortly after it, a set of idea papers, position papers, experience papers, demos, and lightning talks, and a panel discussion. The main part of the report covers the set of working groups that formed during the meeting, and for each, discusses the participants, the objective and goal, and how the objective can be reached, along with contact information for readers who may want to join the group. Finally, we present results from a survey of the workshop attendees.