DBAIMay 30, 2025

Towards Scalable Schema Mapping using Large Language Models

arXiv:2505.24716v18 citationsh-index: 11Proceedings of the 1st workshop connecting academia and industry on Modern Integrated Database and AI Systems
Originality Incremental advance
AI Analysis

This work addresses the problem of costly and manual schema mapping for data integration systems, though it appears incremental by building on existing LLM-based approaches.

The paper tackles the scalability challenges in data integration by using large language models (LLMs) for schema mapping, addressing issues like inconsistent outputs, expressive mappings, and computational costs through techniques such as sampling, aggregation, and prefiltering.

The growing need to integrate information from a large number of diverse sources poses significant scalability challenges for data integration systems. These systems often rely on manually written schema mappings, which are complex, source-specific, and costly to maintain as sources evolve. While recent advances suggest that large language models (LLMs) can assist in automating schema matching by leveraging both structural and natural language cues, key challenges remain. In this paper, we identify three core issues with using LLMs for schema mapping: (1) inconsistent outputs due to sensitivity to input phrasing and structure, which we propose methods to address through sampling and aggregation techniques; (2) the need for more expressive mappings (e.g., GLaV), which strain the limited context windows of LLMs; and (3) the computational cost of repeated LLM calls, which we propose to mitigate through strategies like data type prefiltering.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes