RACT: Retrieval Augmented Column-Table Learning and Prediction for Multi-Table Schema Matching

Leonard Traeger, Enas Khwaileh, Andreas Behrend, George Karabatis

arXiv:2606.07843h-index: 4

Originality Incremental advance

AI Analysis

For data integration practitioners, this provides a method to handle heterogeneous schema designs where columns with similar meaning reside in different table contexts.

Schema matching for multi-table schemas is improved by exploiting referential context via a self-supervised retrieval framework, achieving up to +70% improvement in matching precision and completeness over similarity-based baselines.

Schema matching, a critical task for integrating data from diverse sources, seeks to identify correspondences between columns across different schemas. In multi-table holistic schema matching, columns with similar semantic meaning may reside in tables with different contexts due to heterogeneous schema designs, where similarity-based techniques are inadequate. The focus of this paper is exploiting referential context into schema matching by introducing RACT learning and prediction, a self-supervised framework enabling the probabilistic retrieval of candidate tables for source columns to constrain relevant column candidates. Experiments demonstrate that this approach outperforms similarity-based baselines on matching multi-table schemas. In subsequent matching experiments, constraining the column search space via top-t tables improves both average matching precision and completeness by up to +70%.

View on arXiv PDF

Similar