COMP-PH LGAug 13, 2020

A community-powered search of machine learning strategy space to find NMR property prediction models

Lars A. Bratholm, Will Gerrard, Brandon Anderson, Shaojie Bai, Sunghwan Choi, Lam Dang, Pavel Hanchar, Addison Howard, Guillaume Huard, Sanghoon Kim, Zico Kolter, Risi Kondor

arXiv:2008.05994v13.320 citationsh-index: 53Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the difficulty for physical scientists in selecting optimal ML strategies for domain-specific predictions, though it is incremental as it applies existing community competition methods to a new dataset.

The authors tackled the challenge of efficiently exploring the vast machine learning strategy space for predicting NMR properties in molecules by organizing a community-powered Kaggle competition, which within 3 weeks produced models matching their previous best efforts and a meta-ensemble model achieving 7-19x better accuracy than prior state-of-the-art.

The rise of machine learning (ML) has created an explosion in the potential strategies for using data to make scientific predictions. For physical scientists wishing to apply ML strategies to a particular domain, it can be difficult to assess in advance what strategy to adopt within a vast space of possibilities. Here we outline the results of an online community-powered effort to swarm search the space of ML strategies and develop algorithms for predicting atomic-pairwise nuclear magnetic resonance (NMR) properties in molecules. Using an open-source dataset, we worked with Kaggle to design and host a 3-month competition which received 47,800 ML model predictions from 2,700 teams in 84 countries. Within 3 weeks, the Kaggle community produced models with comparable accuracy to our best previously published "in-house" efforts. A meta-ensemble model constructed as a linear combination of the top predictions has a prediction accuracy which exceeds that of any individual model, 7-19x better than our previous state-of-the-art. The results highlight the potential of transformer architectures for predicting quantum mechanical (QM) molecular properties.

View on arXiv PDF Code

Similar