MTRL-SCILGMar 18, 2025

Causal Discovery from Data Assisted by Large Language Models

arXiv:2503.13833v16 citationsh-index: 4Applied Physics Letters
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of precise engineering of ferroelectric materials by combining LLM-driven literature analysis with data-driven causal discovery, though it is incremental as it applies existing methods to a new domain.

The paper tackled the problem of discovering causal relationships in materials science by integrating high-resolution STEM data with insights from fine-tuned LLMs on domain literature, resulting in the construction of DAGs to map causal links in SmBFO and hypothesize synthesis effects on properties like coercive field, with experimental validation guidance.

Knowledge driven discovery of novel materials necessitates the development of the causal models for the property emergence. While in classical physical paradigm the causal relationships are deduced based on the physical principles or via experiment, rapid accumulation of observational data necessitates learning causal relationships between dissimilar aspects of materials structure and functionalities based on observations. For this, it is essential to integrate experimental data with prior domain knowledge. Here we demonstrate this approach by combining high-resolution scanning transmission electron microscopy (STEM) data with insights derived from large language models (LLMs). By fine-tuning ChatGPT on domain-specific literature, such as arXiv papers on ferroelectrics, and combining obtained information with data-driven causal discovery, we construct adjacency matrices for Directed Acyclic Graphs (DAGs) that map the causal relationships between structural, chemical, and polarization degrees of freedom in Sm-doped BiFeO3 (SmBFO). This approach enables us to hypothesize how synthesis conditions influence material properties, particularly the coercive field (E0), and guides experimental validation. The ultimate objective of this work is to develop a unified framework that integrates LLM-driven literature analysis with data-driven discovery, facilitating the precise engineering of ferroelectric materials by establishing clear connections between synthesis conditions and their resulting material properties.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes