CLNov 20, 2021

Combining Data-driven Supervision with Human-in-the-loop Feedback for Entity Resolution

Wenpeng Yin, Shelby Heinecke, Jia Li, Nitish Shirish Keskar, Michael Jones, Shouzhong Shi, Stanislav Georgiev, Kurt Milich, Joseph Esposito, Caiming Xiong

arXiv:2111.10497v10.51 citations

Originality Synthesis-oriented

AI Analysis

This addresses the problem of distribution shifts in production environments for practitioners building entity resolution systems, but it is incremental as it applies existing human-in-the-loop methods to a specific case.

The paper tackled the performance gap between training and production data in entity resolution systems by implementing a human-in-the-loop, data-centric solution, resulting in improved model adaptation to real-world variations.

The distribution gap between training datasets and data encountered in production is well acknowledged. Training datasets are often constructed over a fixed period of time and by carefully curating the data to be labeled. Thus, training datasets may not contain all possible variations of data that could be encountered in real-world production environments. Tasked with building an entity resolution system - a model that identifies and consolidates data points that represent the same person - our first model exhibited a clear training-production performance gap. In this case study, we discuss our human-in-the-loop enabled, data-centric solution to closing the training-production performance divergence. We conclude with takeaways that apply to data-centric learning at large.

View on arXiv PDF

Similar