LG COMP-PHAug 11, 2023

Does AI for science need another ImageNet Or totally different benchmarks? A case study of machine learning force fields

Yatao Li, Wanling Gao, Lei Wang, Lixin Sun, Zun Wang, Jianfeng Zhan

arXiv:2308.05999v13.81 citationsh-index: 101

Originality Synthesis-oriented

AI Analysis

It addresses the problem of ineffective benchmarking for AI in science, which is incremental by proposing specific improvements for machine learning force fields.

This paper investigates the need for new benchmarking approaches for AI in science, using machine learning force fields as a case study, and proposes a suite of metrics that better assess model performance in real-world scientific applications compared to traditional methods.

AI for science (AI4S) is an emerging research field that aims to enhance the accuracy and speed of scientific computing tasks using machine learning methods. Traditional AI benchmarking methods struggle to adapt to the unique challenges posed by AI4S because they assume data in training, testing, and future real-world queries are independent and identically distributed, while AI4S workloads anticipate out-of-distribution problem instances. This paper investigates the need for a novel approach to effectively benchmark AI for science, using the machine learning force field (MLFF) as a case study. MLFF is a method to accelerate molecular dynamics (MD) simulation with low computational cost and high accuracy. We identify various missed opportunities in scientifically meaningful benchmarking and propose solutions to evaluate MLFF models, specifically in the aspects of sample efficiency, time domain sensitivity, and cross-dataset generalization capabilities. By setting up the problem instantiation similar to the actual scientific applications, more meaningful performance metrics from the benchmark can be achieved. This suite of metrics has demonstrated a better ability to assess a model's performance in real-world scientific applications, in contrast to traditional AI benchmarking methodologies. This work is a component of the SAIBench project, an AI4S benchmarking suite. The project homepage is https://www.computercouncil.org/SAIBench.

View on arXiv PDF

Similar