Universal Semantic Embeddings of Chemical Elements for Enhanced Materials Inference and Discovery
This work addresses the challenge of accelerating materials discovery for materials scientists, though it appears incremental as it adapts existing BERT methods to a specific domain.
The researchers tackled the problem of materials inference and discovery by developing universal semantic embeddings of chemical elements, which achieved up to 23% gains in prediction accuracy for tasks like predicting mechanical properties and classifying phase structures in alloys.
We present a framework for generating universal semantic embeddings of chemical elements to advance materials inference and discovery. This framework leverages ElementBERT, a domain-specific BERT-based natural language processing model trained on 1.29 million abstracts of alloy-related scientific papers, to capture latent knowledge and contextual relationships specific to alloys. These semantic embeddings serve as robust elemental descriptors, consistently outperforming traditional empirical descriptors with significant improvements across multiple downstream tasks. These include predicting mechanical and transformation properties, classifying phase structures, and optimizing materials properties via Bayesian optimization. Applications to titanium alloys, high-entropy alloys, and shape memory alloys demonstrate up to 23% gains in prediction accuracy. Our results show that ElementBERT surpasses general-purpose BERT variants by encoding specialized alloy knowledge. By bridging contextual insights from scientific literature with quantitative inference, our framework accelerates the discovery and optimization of advanced materials, with potential applications extending beyond alloys to other material classes.