Shuyi Jia

MTRL-SCI
h-index6
3papers
51citations
Novelty58%
AI Score31

3 Papers

MTRL-SCIJul 27, 2022
Atomic structure generation from reconstructing structural fingerprints

Victor Fung, Shuyi Jia, Jiaxin Zhang et al.

Data-driven machine learning methods have the potential to dramatically accelerate the rate of materials design over conventional human-guided approaches. These methods would help identify or, in the case of generative models, even create novel crystal structures of materials with a set of specified functional properties to then be synthesized or isolated in the laboratory. For crystal structure generation, a key bottleneck lies in developing suitable atomic structure fingerprints or representations for the machine learning model, analogous to the graph-based or SMILES representations used in molecular generation. However, finding data-efficient representations that are invariant to translations, rotations, and permutations, while remaining invertible to the Cartesian atomic coordinates remains an ongoing challenge. Here, we propose an alternative approach to this problem by taking existing non-invertible representations with the desired invariances and developing an algorithm to reconstruct the atomic coordinates through gradient-based optimization using automatic differentiation. This can then be coupled to a generative machine learning model which generates new materials within the representation space, rather than in the data-inefficient Cartesian space. In this work, we implement this end-to-end structure generation approach using atom-centered symmetry functions as the representation and conditional variational autoencoders as the generative model. We are able to successfully generate novel and valid atomic structures of sub-nanometer Pt nanoparticles as a proof of concept. Furthermore, this method can be readily extended to any suitable structural representation, thereby providing a powerful, generalizable framework towards structure-based generation.

MTRL-SCIMar 3, 2025
Pre-training Graph Neural Networks with Structural Fingerprints for Materials Discovery

Shuyi Jia, Shitij Govil, Manav Ramprasad et al.

In recent years, pre-trained graph neural networks (GNNs) have been developed as general models which can be effectively fine-tuned for various potential downstream tasks in materials science, and have shown significant improvements in accuracy and data efficiency. The most widely used pre-training methods currently involve either supervised training to fit a general force field or self-supervised training by denoising atomic structures equilibrium. Both methods require datasets generated from quantum mechanical calculations, which quickly become intractable when scaling to larger datasets. Here we propose a novel pre-training objective which instead uses cheaply-computed structural fingerprints as targets while maintaining comparable performance across a range of different structural descriptors. Our experiments show this approach can act as a general strategy for pre-training GNNs with application towards large scale foundational models for atomistic data.

MTRL-SCIJun 19, 2024
LLMatDesign: Autonomous Materials Discovery with Large Language Models

Shuyi Jia, Chao Zhang, Victor Fung

Discovering new materials can have significant scientific and technological implications but remains a challenging problem today due to the enormity of the chemical space. Recent advances in machine learning have enabled data-driven methods to rapidly screen or generate promising materials, but these methods still depend heavily on very large quantities of training data and often lack the flexibility and chemical understanding often desired in materials discovery. We introduce LLMatDesign, a novel language-based framework for interpretable materials design powered by large language models (LLMs). LLMatDesign utilizes LLM agents to translate human instructions, apply modifications to materials, and evaluate outcomes using provided tools. By incorporating self-reflection on its previous decisions, LLMatDesign adapts rapidly to new tasks and conditions in a zero-shot manner. A systematic evaluation of LLMatDesign on several materials design tasks, in silico, validates LLMatDesign's effectiveness in developing new materials with user-defined target properties in the small data regime. Our framework demonstrates the remarkable potential of autonomous LLM-guided materials discovery in the computational setting and towards self-driving laboratories in the future.