DB CY LGApr 10, 2024

FairEM360: A Suite for Responsible Entity Matching

Nima Shahbazi, Mahdi Erfanian, Abolfazl Asudeh, Fatemeh Nargesian, Divesh Srivastava

arXiv:2404.07354v22.31 citationsh-index: 21Has CodeProc VLDB Endow

Originality Synthesis-oriented

AI Analysis

This addresses fairness issues in entity matching for data pipeline practitioners, but it is incremental as it builds on existing fairness measures and ensemble methods.

The paper tackles the problem of unintentional biases in entity matching, which can affect downstream data quality, by introducing FairEM360, a framework that audits fairness, explains unfairness, and provides resolutions through human-in-the-loop feedback.

Entity matching is one the earliest tasks that occur in the big data pipeline and is alarmingly exposed to unintentional biases that affect the quality of data. Identifying and mitigating the biases that exist in the data or are introduced by the matcher at this stage can contribute to promoting fairness in downstream tasks. This demonstration showcases FairEM360, a framework for 1) auditing the output of entity matchers across a wide range of fairness measures and paradigms, 2) providing potential explanations for the underlying reasons for unfairness, and 3) providing resolutions for the unfairness issues through an exploratory process with human-in-the-loop feedback, utilizing an ensemble of matchers. We aspire for FairEM360 to contribute to the prioritization of fairness as a key consideration in the evaluation of EM pipelines.

View on arXiv PDF Code

Similar