Materials Informatics Transformer: A Language Model for Interpretable Materials Properties Prediction
This work addresses material property prediction for materials science, enabling high-throughput screening, but it is incremental as it extends existing LLM paradigms to a new domain.
The paper tackles the problem of predicting material properties by introducing the Materials Informatics Transformer (MatInFormer), a language model that learns crystallography grammar and incorporates task-specific data, achieving effectiveness validated across 14 distinct datasets.
Recently, the remarkable capabilities of large language models (LLMs) have been illustrated across a variety of research domains such as natural language processing, computer vision, and molecular modeling. We extend this paradigm by utilizing LLMs for material property prediction by introducing our model Materials Informatics Transformer (MatInFormer). Specifically, we introduce a novel approach that involves learning the grammar of crystallography through the tokenization of pertinent space group information. We further illustrate the adaptability of MatInFormer by incorporating task-specific data pertaining to Metal-Organic Frameworks (MOFs). Through attention visualization, we uncover the key features that the model prioritizes during property prediction. The effectiveness of our proposed model is empirically validated across 14 distinct datasets, hereby underscoring its potential for high throughput screening through accurate material property prediction.