DB AIMar 3, 2024

ReMatch: Retrieval Enhanced Schema Matching with LLMs

Eitam Sheetrit, Menachem Brief, Moshik Mishaeli, Oren Elisha

Microsoft

arXiv:2403.01567v210.331 citationsh-index: 10Has Code

Originality Incremental advance

AI Analysis

It addresses schema matching challenges for data integration, offering a practical solution without requiring training data, though it appears incremental as it builds on existing LLM approaches.

The paper tackles the problem of schema matching in data integration by introducing ReMatch, a retrieval-enhanced LLM method that eliminates the need for training data or access to source database data, achieving effective results on large real-world schemas.

Schema matching is a crucial task in data integration, involving the alignment of a source schema with a target schema to establish correspondence between their elements. This task is challenging due to textual and semantic heterogeneity, as well as differences in schema sizes. Although machine-learning-based solutions have been explored in numerous studies, they often suffer from low accuracy, require manual mapping of the schemas for model training, or need access to source schema data which might be unavailable due to privacy concerns. In this paper we present a novel method, named ReMatch, for matching schemas using retrieval-enhanced Large Language Models (LLMs). Our method avoids the need for predefined mapping, any model training, or access to data in the source database. Our experimental results on large real-world schemas demonstrate that ReMatch is an effective matcher. By eliminating the requirement for training data, ReMatch becomes a viable solution for real-world scenarios.

View on arXiv PDF Code

Similar