DIS-NNMar 20, 2023
Machine Learning Automated Approach for Enormous Synchrotron X-Ray Diffraction Data InterpretationXiaodong Zhao, YiXuan Luo, Juejing Liu et al. · deepmind
Manual analysis of XRD data is usually laborious and time consuming. The deep neural network (DNN) based models trained by synthetic XRD patterns are proved to be an automatic, accurate, and high throughput method to analysis common XRD data collected from solid sample in ambient environment. However, it remains unknown that whether synthetic XRD based models are capable to solve u-XRD mapping data for in-situ experiments involving liquid phase exhibiting lower quality with significant artifacts. In this study, we collected u-XRD mapping data from an LaCl3-calcite hydrothermal fluid system and trained two categories of models to solve the experimental XRD patterns. The models trained by synthetic XRD patterns show low accuracy (as low as 64%) when solving experimental u-XRD mapping data. The accuracy of the DNN models was significantly improved (90% or above) when training them with the dataset containing both synthetic and small number of labeled experimental u-XRD patterns. This study highlighted the importance of labeled experimental patterns on the training of DNN models to solve u-XRD mapping data from in-situ experiments involving liquid phase.
LGSep 15, 2025
OASIS: A Deep Learning Framework for Universal Spectroscopic Analysis Driven by Novel Loss FunctionsChris Young, Juejing Liu, Marie L. Mortensen et al.
The proliferation of spectroscopic data across various scientific and engineering fields necessitates automated processing. We introduce OASIS (Omni-purpose Analysis of Spectra via Intelligent Systems), a machine learning (ML) framework for technique-independent, automated spectral analysis, encompassing denoising, baseline correction, and comprehensive peak parameter (location, intensity, FWHM) retrieval without human intervention. OASIS achieves its versatility through models trained on a strategically designed synthetic dataset incorporating features from numerous spectroscopy techniques. Critically, the development of innovative, task-specific loss functions-such as the vicinity peak response (ViPeR) for peak localization-enabled the creation of compact yet highly accurate models from this dataset, validated with experimental data from Raman, UV-vis, and fluorescence spectroscopy. OASIS demonstrates significant potential for applications including in situ experiments, high-throughput optimization, and online monitoring. This study underscores the optimization of the loss function as a key resource-efficient strategy to develop high-performance ML models.
MTRL-SCIJul 9, 2025
Thermodynamic Prediction Enabled by Automatic Dataset Building and Machine LearningJuejing Liu, Haydn Anderson, Noah I. Waxman et al.
New discoveries in chemistry and materials science, with increasingly expanding volume of requisite knowledge and experimental workload, provide unique opportunities for machine learning (ML) to take critical roles in accelerating research efficiency. Here, we demonstrate (1) the use of large language models (LLMs) for automated literature reviews, and (2) the training of an ML model to predict chemical knowledge (thermodynamic parameters). Our LLM-based literature review tool (LMExt) successfully extracted chemical information and beyond into a machine-readable structure, including stability constants for metal cation-ligand interactions, thermodynamic properties, and other broader data types (medical research papers, and financial reports), effectively overcoming the challenges inherent in each domain. Using the autonomous acquisition of thermodynamic data, an ML model was trained using the CatBoost algorithm for accurately predicting thermodynamic parameters (e.g., enthalpy of formation) of minerals. This work highlights the transformative potential of integrated ML approaches to reshape chemistry and materials science research.
MTRL-SCIMar 15, 2024
Accurate and Data-Efficient Micro-XRD Phase Identification Using Multi-Task Learning: Application to Hydrothermal FluidsYanfei Li, Juejing Liu, Xiaodong Zhao et al.
Traditional analysis of highly distorted micro-X-ray diffraction (μ-XRD) patterns from hydrothermal fluid environments is a time-consuming process, often requiring substantial data preprocessing and labeled experimental data. This study demonstrates the potential of deep learning with a multitask learning (MTL) architecture to overcome these limitations. We trained MTL models to identify phase information in μ-XRD patterns, minimizing the need for labeled experimental data and masking preprocessing steps. Notably, MTL models showed superior accuracy compared to binary classification CNNs. Additionally, introducing a tailored cross-entropy loss function improved MTL model performance. Most significantly, MTL models tuned to analyze raw and unmasked XRD patterns achieved close performance to models analyzing preprocessed data, with minimal accuracy differences. This work indicates that advanced deep learning architectures like MTL can automate arduous data handling tasks, streamline the analysis of distorted XRD patterns, and reduce the reliance on labor-intensive experimental datasets.