LG DATA-AN MLApr 25, 2020

A Bayesian machine scientist to aid in the solution of challenging scientific problems

Roger Guimera, Ignasi Reichardt, Antoni Aguilar-Mogas, Francesco A Massucci, Manuel Miranda, Jordi Pallares, Marta Sales-Pardo

arXiv:2004.12157v120.8156 citations

Originality Incremental advance

AI Analysis

This addresses the need for automated, interpretable model discovery in fields like physics and social sciences, though it is incremental as it builds on Bayesian and MCMC methods.

The authors tackled the problem of automatically extracting interpretable mathematical models from large datasets by introducing a Bayesian machine scientist that uses MCMC to explore model space and learns priors from a corpus of expressions. They showed it uncovers accurate models for synthetic and real data, with out-of-sample predictions more accurate than existing approaches and nonparametric methods.

Closed-form, interpretable mathematical models have been instrumental for advancing our understanding of the world; with the data revolution, we may now be in a position to uncover new such models for many systems from physics to the social sciences. However, to deal with increasing amounts of data, we need "machine scientists" that are able to extract these models automatically from data. Here, we introduce a Bayesian machine scientist, which establishes the plausibility of models using explicit approximations to the exact marginal posterior over models and establishes its prior expectations about models by learning from a large empirical corpus of mathematical expressions. It explores the space of models using Markov chain Monte Carlo. We show that this approach uncovers accurate models for synthetic and real data and provides out-of-sample predictions that are more accurate than those of existing approaches and of other nonparametric methods.

View on arXiv PDF

Similar