MTRL-SCIOct 19, 2023
Approaches for Uncertainty Quantification of AI-predicted Material Properties: A ComparisonFrancesca Tavazza, Kamal Choudhary, Brian DeCost
The development of large databases of material properties, together with the availability of powerful computers, has allowed machine learning (ML) modeling to become a widely used tool for predicting material performances. While confidence intervals are commonly reported for such ML models, prediction intervals, i.e., the uncertainty on each prediction, are not as frequently available. Here, we investigate three easy-to-implement approaches to determine such individual uncertainty, comparing them across ten ML quantities spanning energetics, mechanical, electronic, optical, and spectral properties. Specifically, we focused on the Quantile approach, the direct machine learning of the prediction intervals and Ensemble methods.
LGNov 10, 2023
Learning material synthesis-process-structure-property relationship by data fusion: Bayesian Coregionalization N-Dimensional Piecewise Function LearningA. Gilad Kusne, Austin McDannald, Brian DeCost
Autonomous materials research labs require the ability to combine and learn from diverse data streams. This is especially true for learning material synthesis-process-structure-property relationships, key to accelerating materials optimization and discovery as well as accelerating mechanistic understanding. We present the Synthesis-process-structure-property relAtionship coreGionalized lEarner (SAGE) algorithm. A fully Bayesian algorithm that uses multimodal coregionalization to merge knowledge across data sources to learn synthesis-process-structure-property relationships. SAGE outputs a probabilistic posterior for the relationships including the most likely relationships given the data.
MTRL-SCINov 21, 2025
When Active Learning Fails, Uncalibrated Out of Distribution Uncertainty Quantification Might Be the ProblemAshley S. Dale, Kangming Li, Brian DeCost et al.
Efficiently and meaningfully estimating prediction uncertainty is important for exploration in active learning campaigns in materials discovery, where samples with high uncertainty are interpreted as containing information missing from the model. In this work, the effect of different uncertainty estimation and calibration methods are evaluated for active learning when using ensembles of ALIGNN, eXtreme Gradient Boost, Random Forest, and Neural Network model architectures. We compare uncertainty estimates from ALIGNN deep ensembles to loss landscape uncertainty estimates obtained for solubility, bandgap, and formation energy prediction tasks. We then evaluate how the quality of the uncertainty estimate impacts an active learning campaign that seeks model generalization to out-of-distribution data. Uncertainty calibration methods were found to variably generalize from in-domain data to out-of-domain data. Furthermore, calibrated uncertainties were generally unsuccessful in reducing the amount of data required by a model to improve during an active learning campaign on out-of-distribution data when compared to random sampling and uncalibrated uncertainties. The impact of poor-quality uncertainty persists for random forest and eXtreme Gradient Boosting models trained on the same data for the same tasks, indicating that this is at least partially intrinsic to the data and not due to model capacity alone. Analysis of the target, in-distribution uncertainty, out-of-distribution uncertainty, and training residual distributions suggest that future work focus on understanding empirical uncertainties in the feature input space for cases where ensemble prediction variances do not accurately capture the missing information required for the model to generalize.
MTRL-SCINov 15, 2021
Physics in the Machine: Integrating Physical Knowledge in Autonomous Phase-MappingA. Gilad Kusne, Austin McDannald, Brian DeCost et al.
Application of artificial intelligence (AI), and more specifically machine learning, to the physical sciences has expanded significantly over the past decades. In particular, science-informed AI, also known as scientific AI or inductive bias AI, has grown from a focus on data analysis to now controlling experiment design, simulation, execution and analysis in closed-loop autonomous systems. The CAMEO (closed-loop autonomous materials exploration and optimization) algorithm employs scientific AI to address two tasks: learning a material system's composition-structure relationship and identifying materials compositions with optimal functional properties. By integrating these, accelerated materials screening across compositional phase diagrams was demonstrated, resulting in the discovery of a best-in-class phase change memory material. Key to this success is the ability to guide subsequent measurements to maximize knowledge of the composition-structure relationship, or phase map. In this work we investigate the benefits of incorporating varying levels of prior physical knowledge into CAMEO's autonomous phase-mapping. This includes the use of ab-initio phase boundary data from the AFLOW repositories, which has been shown to optimize CAMEO's search when used as a prior.
MTRL-SCIJun 11, 2020
On-the-fly Closed-loop Autonomous Materials Discovery via Bayesian Active LearningA. Gilad Kusne, Heshan Yu, Changming Wu et al.
Active learning - the field of machine learning (ML) dedicated to optimal experiment design, has played a part in science as far back as the 18th century when Laplace used it to guide his discovery of celestial mechanics [1]. In this work we focus a closed-loop, active learning-driven autonomous system on another major challenge, the discovery of advanced materials against the exceedingly complex synthesis-processes-structure-property landscape. We demonstrate autonomous research methodology (i.e. autonomous hypothesis definition and evaluation) that can place complex, advanced materials in reach, allowing scientists to fail smarter, learn faster, and spend less resources in their studies, while simultaneously improving trust in scientific results and machine learning tools. Additionally, this robot science enables science-over-the-network, reducing the economic impact of scientists being physically separated from their labs. We used the real-time closed-loop, autonomous system for materials exploration and optimization (CAMEO) at the synchrotron beamline to accelerate the fundamentally interconnected tasks of rapid phase mapping and property optimization, with each cycle taking seconds to minutes, resulting in the discovery of a novel epitaxial nanocomposite phase-change memory material.
AINov 1, 2017
Building Data-driven Models with Microstructural Images: Generalization and InterpretabilityJulia Ling, Maxwell Hutchinson, Erin Antono et al.
As data-driven methods rise in popularity in materials science applications, a key question is how these machine learning models can be used to understand microstructure. Given the importance of process-structure-property relations throughout materials science, it seems logical that models that can leverage microstructural data would be more capable of predicting property information. While there have been some recent attempts to use convolutional neural networks to understand microstructural images, these early studies have focused only on which featurizations yield the highest machine learning model accuracy for a single data set. This paper explores the use of convolutional neural networks for classifying microstructure with a more holistic set of objectives in mind: generalization between data sets, number of features required, and interpretability.