CLJun 13, 2024

Learning from Natural Language Explanations for Generalizable Entity Matching

Somin Wadhwa, Adit Krishnan, Runhui Wang, Byron C. Wallace, Chris Kong

arXiv:2406.09330v214.126 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of scalable and generalizable entity matching for data integration tasks, offering an incremental improvement over existing methods.

The paper tackles the problem of poor generalization and high inference costs in entity matching by recasting it as a conditional generation task and distilling LLM reasoning into smaller models using natural language explanations, achieving a 10.85% F-1 improvement in out-of-domain generalization tests.

Entity matching is the task of linking records from different sources that refer to the same real-world entity. Past work has primarily treated entity linking as a standard supervised learning problem. However, supervised entity matching models often do not generalize well to new data, and collecting exhaustive labeled training data is often cost prohibitive. Further, recent efforts have adopted LLMs for this task in few/zero-shot settings, exploiting their general knowledge. But LLMs are prohibitively expensive for performing inference at scale for real-world entity matching tasks. As an efficient alternative, we re-cast entity matching as a conditional generation task as opposed to binary classification. This enables us to "distill" LLM reasoning into smaller entity matching models via natural language explanations. This approach achieves strong performance, especially on out-of-domain generalization tests (10.85% F-1) where standalone generative methods struggle. We perform ablations that highlight the importance of explanations, both for performance and model robustness.

View on arXiv PDF

Similar