What do Large Language Models know about materials?
This work addresses the problem of evaluating LLMs for scientific accuracy in materials science, which is incremental as it builds on existing LLM capabilities to propose a domain-specific benchmark.
The paper investigates the factual knowledge of large language models (LLMs) about materials, specifically using the Periodic Table of Elements as an example, and finds that tokenization and vocabulary affect their ability to generate correct information, leading to a benchmark for assessing LLM applicability in materials science.
Large Language Models (LLMs) are increasingly applied in the fields of mechanical engineering and materials science. As models that establish connections through the interface of language, LLMs can be applied for step-wise reasoning through the Processing-Structure-Property-Performance chain of material science and engineering. Current LLMs are built for adequately representing a dataset, which is the most part of the accessible internet. However, the internet mostly contains non-scientific content. If LLMs should be applied for engineering purposes, it is valuable to investigate models for their intrinsic knowledge -- here: the capacity to generate correct information about materials. In the current work, for the example of the Periodic Table of Elements, we highlight the role of vocabulary and tokenization for the uniqueness of material fingerprints, and the LLMs' capabilities of generating factually correct output of different state-of-the-art open models. This leads to a material knowledge benchmark for an informed choice, for which steps in the PSPP chain LLMs are applicable, and where specialized models are required.