Distributed Representations of Atoms and Materials for Machine Learning
This work addresses the need for effective machine learning models in materials science by providing new representations that can improve predictions for researchers in computational chemistry and materials design, though it is incremental in building on existing representation methods.
The authors tackled the problem of representing atoms and materials for machine learning in computational materials science by deriving distributed representations of compounds from chemical formulas and introducing SkipAtom for learning atom representations. They found these representations competitive with structure-based benchmarks on tasks like formation energy and band gap prediction, and superior when only composition is available.
The use of machine learning is becoming increasingly common in computational materials science. To build effective models of the chemistry of materials, useful machine-based representations of atoms and their compounds are required. We derive distributed representations of compounds from their chemical formulas only, via pooling operations of distributed representations of atoms. These compound representations are evaluated on ten different tasks, such as the prediction of formation energy and band gap, and are found to be competitive with existing benchmarks that make use of structure, and even superior in cases where only composition is available. Finally, we introduce a new approach for learning distributed representations of atoms, named SkipAtom, which makes use of the growing information in materials structure databases.