Outlier-Based Domain of Applicability Identification for Materials Property Prediction Models
This addresses the need for reliable model deployment in materials science, though it appears incremental as it builds on existing outlier-based techniques.
The paper tackles the problem of unknown performance on unseen materials in machine learning models for material property prediction by proposing a method to identify domains of applicability, enabling confidence assessment and model improvement.
Machine learning models have been widely applied for material property prediction. However, practical application of these models can be hindered by a lack of information about how well they will perform on previously unseen types of materials. Because machine learning model predictions depend on the quality of the available training data, different domains of the material feature space are predicted with different accuracy levels by such models. The ability to identify such domains enables the ability to find the confidence level of each prediction, to determine when and how the model should be employed depending on the prediction accuracy requirements of different tasks, and to improve the model for domains with high errors. In this work, we propose a method to find domains of applicability using a large feature space and also introduce analysis techniques to gain more insight into the detected domains and subdomains.