MTRL-SCINov 24, 2025
High-throughput validation of phase formability and simulation accuracy of Cantor alloysChangjun Cheng, Daniel Persaud, Kangming Li et al.
High-throughput methods enable accelerated discovery of novel materials in complex systems such as high-entropy alloys, which exhibit intricate phase stability across vast compositional spaces. Computational approaches, including Density Functional Theory (DFT) and calculation of phase diagrams (CALPHAD), facilitate screening of phase formability as a function of composition and temperature. However, the integration of computational predictions with experimental validation remains challenging in high-throughput studies. In this work, we introduce a quantitative confidence metric to assess the agreement between predictions and experimental observations, providing a quantitative measure of the confidence of machine learning models trained on either DFT or CALPHAD input in accounting for experimental evidence. The experimental dataset was generated via high-throughput in-situ synchrotron X-ray diffraction on compositionally varied FeNiMnCr alloy libraries, heated from room temperature to ~1000 °C. Agreement between the observed and predicted phases was evaluated using either temperature-independent phase classification or a model that incorporates a temperature-dependent probability of phase formation. This integrated approach demonstrates where strong overall agreement between computation and experiment exists, while also identifying key discrepancies, particularly in FCC/BCC predictions at Mn-rich regions to inform future model refinement.
LGJul 15, 2025
Exploring the Frontiers of kNN Noisy Feature Detection and Recovery for Self-Driving LabsQiuyu Shi, Kangming Li, Yao Fehlis et al.
Self-driving laboratories (SDLs) have shown promise to accelerate materials discovery by integrating machine learning with automated experimental platforms. However, errors in the capture of input parameters may corrupt the features used to model system performance, compromising current and future campaigns. This study develops an automated workflow to systematically detect noisy features, determine sample-feature pairings that can be corrected, and finally recover the correct feature values. A systematic study is then performed to examine how dataset size, noise intensity, and feature value distribution affect both the detectability and recoverability of noisy features. In general, high-intensity noise and large training datasets are conducive to the detection and correction of noisy features. Low-intensity noise reduces detection and recovery but can be compensated for by larger clean training data sets. Detection and correction results vary between features with continuous and dispersed feature distributions showing greater recoverability compared to features with discrete or narrow distributions. This systematic study not only demonstrates a model agnostic framework for rational data recovery in the presence of noise, limited data, and differing feature distributions but also provides a tangible benchmark of kNN imputation in materials data sets. Ultimately, it aims to enhance data quality and experimental precision in automated materials discovery.