MTRL-SCIDBLGNov 15, 2022

ET-AL: Entropy-Targeted Active Learning for Bias Mitigation in Materials Data

arXiv:2211.07881v423 citationsh-index: 64
Originality Incremental advance
AI Analysis

This work addresses bias mitigation in materials data for data-driven materials discovery, though it is incremental as it builds on existing active learning methods.

The paper tackles data bias from uneven coverage of materials families in materials databases by proposing an entropy-based metric and an entropy-targeted active learning framework to guide data acquisition, resulting in improved diversity and downstream model performance.

Growing materials data and data-driven informatics drastically promote the discovery and design of materials. While there are significant advancements in data-driven models, the quality of data resources is less studied despite its huge impact on model performance. In this work, we focus on data bias arising from uneven coverage of materials families in existing knowledge. Observing different diversities among crystal systems in common materials databases, we propose an information entropy-based metric for measuring this bias. To mitigate the bias, we develop an entropy-targeted active learning (ET-AL) framework, which guides the acquisition of new data to improve the diversity of underrepresented crystal systems. We demonstrate the capability of ET-AL for bias mitigation and the resulting improvement in downstream machine learning models. This approach is broadly applicable to data-driven materials discovery, including autonomous data acquisition and dataset trimming to reduce bias, as well as data-driven informatics in other scientific domains.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes