LG COMP-PHOct 25, 2021

Scientific Machine Learning Benchmarks

Jeyan Thiyagalingam, Mallikarjun Shankar, Geoffrey Fox, Tony Hey

arXiv:2110.12773v124.4171 citations

Originality Synthesis-oriented

AI Analysis

This work tackles the problem of algorithm selection for scientists dealing with high-volume data from national laboratories, but it is incremental as it builds on existing benchmarking methods.

The paper addresses the challenge of selecting appropriate machine learning algorithms for scientific datasets by developing benchmarks, reviewing existing approaches to help automate data analysis for large-scale experimental facilities.

The breakthrough in Deep Learning neural networks has transformed the use of AI and machine learning technologies for the analysis of very large experimental datasets. These datasets are typically generated by large-scale experimental facilities at national laboratories. In the context of science, scientific machine learning focuses on training machines to identify patterns, trends, and anomalies to extract meaningful scientific insights from such datasets. With a new generation of experimental facilities, the rate of data generation and the scale of data volumes will increasingly require the use of more automated data analysis. At present, identifying the most appropriate machine learning algorithm for the analysis of any given scientific dataset is still a challenge for scientists. This is due to many different machine learning frameworks, computer architectures, and machine learning models. Historically, for modelling and simulation on HPC systems such problems have been addressed through benchmarking computer applications, algorithms, and architectures. Extending such a benchmarking approach and identifying metrics for the application of machine learning methods to scientific datasets is a new challenge for both scientists and computer scientists. In this paper, we describe our approach to the development of scientific machine learning benchmarks and review other approaches to benchmarking scientific machine learning.

View on arXiv PDF

Similar