CL IRApr 23

Unlocking the Power of Large Language Models for Multi-table Entity Matching

Yingkai Tang, Taoyu Su, Wenyuan Zhang, Xiaoyang Guo, Tingwen Liu

arXiv:2604.2123886.6h-index: 3Has Code

Predicted impact top 12% in CL · last 90 daysOriginality Incremental advance

AI Analysis

For data integration practitioners, this provides a practical LLM-based solution to multi-table entity matching with measurable gains.

LLM4MEM improves multi-table entity matching by 5.1% average F1 over baselines across 6 datasets, using LLMs to handle semantic inconsistencies and efficiency issues.

Multi-table entity matching (MEM) addresses the limitations of dual-table approaches by enabling simultaneous identification of equivalent entities across multiple data sources without unique identifiers. However, existing methods relying on pre-trained language models struggle to handle semantic inconsistencies caused by numerical attribute variations. Inspired by the powerful language understanding capabilities of large language models (LLMs), we propose a novel LLM-based framework for multi-table entity matching, termed LLM4MEM. Specifically, we first propose a multi-style prompt-enhanced LLM attribute coordination module to address semantic inconsistencies. Then, to alleviate the matching efficiency problem caused by the surge in the number of entities brought by multiple data sources, we develop a transitive consensus embedding matching module to tackle entity embedding and pre-matching issues. Finally, to address the issue of noisy entities during the matching process, we introduce a density-aware pruning module to optimize the quality of multi-table entity matching. We conducted extensive experiments on 6 MEM datasets, and the results show that our model improves by an average of 5.1% in F1 compared with the baseline model. Our code is available at https://github.com/Ymeki/LLM4MEM.

View on arXiv PDF Code

Similar