MTRL-SCIMay 4
From Knowledge to Action: Outcomes of the 2025 Large Language Model (LLM) Hackathon for Applications in Materials Science and ChemistryAritra Roy, Kevin Shen, Andrew MacBride et al.
Large language models (LLMs) are rapidly changing how researchers in materials science and chemistry discover, organize, and act on scientific knowledge. This paper analyzes a broad set of community-developed LLM applications in an effort to identify emerging patterns in how these systems can be used across the scientific research lifecycle. We organize the projects into two complementary categories: Knowledge Infrastructure, systems that structure, retrieve, synthesize, and validate scientific information; and Action Systems, systems that execute, coordinate, or automate scientific work across computational and experimental environments. The submissions reveal a shift from single-purpose LLM tools toward integrated, multi-agent workflows that combine retrieval, reasoning, tool use, and domain-specific validation. Prominent themes include retrieval-augmented generation as grounding infrastructure, persistent structured knowledge representations, multimodal and multilingual scientific inputs, and early progress toward laboratory-integrated closed-loop systems. Together, these results suggest that LLMs are evolving from general-purpose assistants into composable infrastructure for scientific reasoning and action. This work provides a community snapshot of that transition and a practical taxonomy for understanding emerging LLM-enabled workflows in materials science and chemistry.
LGOct 7, 2025
Assessment of different loss functions for fitting equivalent circuit models to electrochemical impedance spectroscopy dataAli Jaberi, Amin Sadeghi, Runze Zhang et al.
Electrochemical impedance spectroscopy (EIS) data is typically modeled using an equivalent circuit model (ECM), with parameters obtained by minimizing a loss function via nonlinear least squares fitting. This paper introduces two new loss functions, log-B and log-BW, derived from the Bode representation of EIS. Using a large dataset of generated EIS data, the performance of proposed loss functions was evaluated alongside existing ones in terms of R2 scores, chi-squared, computational efficiency, and the mean absolute percentage error (MAPE) between the predicted component values and the original values. Statistical comparisons revealed that the choice of loss function impacts convergence, computational efficiency, quality of fit, and MAPE. Our analysis showed that X2 loss function (squared sum of residuals with proportional weighting) achieved the highest performance across multiple quality of fit metrics, making it the preferred choice when the quality of fit is the primary goal. On the other hand, log-B offered a slightly lower quality of fit while being approximately 1.4 times faster and producing lower MAPE for most circuit components, making log-B as a strong alternative. This is a critical factor for large-scale least squares fitting in data-driven applications, such as training machine learning models on extensive datasets or iterations.
LGJul 15, 2025
Exploring the Frontiers of kNN Noisy Feature Detection and Recovery for Self-Driving LabsQiuyu Shi, Kangming Li, Yao Fehlis et al.
Self-driving laboratories (SDLs) have shown promise to accelerate materials discovery by integrating machine learning with automated experimental platforms. However, errors in the capture of input parameters may corrupt the features used to model system performance, compromising current and future campaigns. This study develops an automated workflow to systematically detect noisy features, determine sample-feature pairings that can be corrected, and finally recover the correct feature values. A systematic study is then performed to examine how dataset size, noise intensity, and feature value distribution affect both the detectability and recoverability of noisy features. In general, high-intensity noise and large training datasets are conducive to the detection and correction of noisy features. Low-intensity noise reduces detection and recovery but can be compensated for by larger clean training data sets. Detection and correction results vary between features with continuous and dispersed feature distributions showing greater recoverability compared to features with discrete or narrow distributions. This systematic study not only demonstrates a model agnostic framework for rational data recovery in the presence of noise, limited data, and differing feature distributions but also provides a tangible benchmark of kNN imputation in materials data sets. Ultimately, it aims to enhance data quality and experimental precision in automated materials discovery.