Applications of Gaussian Processes at Extreme Lengthscales: From Molecules to Black Holes

Cambridge
arXiv:2303.14291v13 citationsh-index: 16Has Code
Originality Synthesis-oriented
AI Analysis

It addresses data scarcity problems for scientists in astrophysics, chemistry, and materials science, though the methods are incremental extensions of existing GP frameworks.

The thesis applied Gaussian processes (GPs) to tackle data scarcity in scientific domains, achieving results such as inferring black hole accretion disc models and discovering novel photoswitch molecules. It also developed open-source software and a Bayesian optimization scheme for robust material discovery.

In many areas of the observational and experimental sciences data is scarce. Data observation in high-energy astrophysics is disrupted by celestial occlusions and limited telescope time while data derived from laboratory experiments in synthetic chemistry and materials science is time and cost-intensive to collect. On the other hand, knowledge about the data-generation mechanism is often available in the sciences, such as the measurement error of a piece of laboratory apparatus. Both characteristics, small data and knowledge of the underlying physics, make Gaussian processes (GPs) ideal candidates for fitting such datasets. GPs can make predictions with consideration of uncertainty, for example in the virtual screening of molecules and materials, and can also make inferences about incomplete data such as the latent emission signature from a black hole accretion disc. Furthermore, GPs are currently the workhorse model for Bayesian optimisation, a methodology foreseen to be a guide for laboratory experiments in scientific discovery campaigns. The first contribution of this thesis is to use GP modelling to reason about the latent emission signature from the Seyfert galaxy Markarian 335, and by extension, to reason about the applicability of various theoretical models of black hole accretion discs. The second contribution is to extend the GP framework to molecular and chemical reaction representations and to provide an open-source software library to enable the framework to be used by scientists. The third contribution is to leverage GPs to discover novel and performant photoswitch molecules. The fourth contribution is to introduce a Bayesian optimisation scheme capable of modelling aleatoric uncertainty to facilitate the identification of material compositions that possess intrinsic robustness to large scale fabrication processes.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes