CL DB LGJan 12, 2023

KAER: A Knowledge Augmented Pre-Trained Language Model for Entity Resolution

Liri Fang, Lan Li, Yiren Liu, Vetle I. Torvik, Bertram Ludäscher

arXiv:2301.04770v12.510 citationsh-index: 45

Originality Incremental advance

AI Analysis

This work addresses entity resolution for data cleaning by enhancing pre-trained models with knowledge, though it is incremental as it builds on existing methods.

The paper tackles entity resolution by proposing KAER, a framework that augments pre-trained language models with external knowledge, improving performance over the state-of-the-art method Ditto, particularly on dirty data and textual/online product datasets.

Entity resolution has been an essential and well-studied task in data cleaning research for decades. Existing work has discussed the feasibility of utilizing pre-trained language models to perform entity resolution and achieved promising results. However, few works have discussed injecting domain knowledge to improve the performance of pre-trained language models on entity resolution tasks. In this study, we propose Knowledge Augmented Entity Resolution (KAER), a novel framework named for augmenting pre-trained language models with external knowledge for entity resolution. We discuss the results of utilizing different knowledge augmentation and prompting methods to improve entity resolution performance. Our model improves on Ditto, the existing state-of-the-art entity resolution method. In particular, 1) KAER performs more robustly and achieves better results on "dirty data", and 2) with more general knowledge injection, KAER outperforms the existing baseline models on the textual dataset and dataset from the online product domain. 3) KAER achieves competitive results on highly domain-specific datasets, such as citation datasets, requiring the injection of expert knowledge in future work.

View on arXiv PDF

Similar