Towards Interpretable and Learnable Risk Analysis for Entity Resolution
This addresses the risk analysis problem in entity resolution, which is incremental as it builds on existing machine learning methods by adding interpretability and learnability for mislabeling prediction.
The paper tackles the problem of predicting and interpreting which entity pairs are mislabeled in machine-learning-based entity resolution, proposing an interpretable and learnable framework that ranks pairs by mislabeling risk and achieves considerably higher accuracy in identifying mislabeled pairs than existing alternatives.
Machine-learning-based entity resolution has been widely studied. However, some entity pairs may be mislabeled by machine learning models and existing studies do not study the risk analysis problem -- predicting and interpreting which entity pairs are mislabeled. In this paper, we propose an interpretable and learnable framework for risk analysis, which aims to rank the labeled pairs based on their risks of being mislabeled. We first describe how to automatically generate interpretable risk features, and then present a learnable risk model and its training technique. Finally, we empirically evaluate the performance of the proposed approach on real data. Our extensive experiments have shown that the learning risk model can identify the mislabeled pairs with considerably higher accuracy than the existing alternatives.