CLCYIRJul 6, 2023

InfoSync: Information Synchronization across Multilingual Semi-structured Tables

arXiv:2307.03313v1224 citationsh-index: 15
Originality Incremental advance
AI Analysis

This addresses the challenge of maintaining consistent multilingual data in resources like Wikipedia, though it is incremental as it builds on existing table alignment and update techniques.

The paper tackles the problem of synchronizing semi-structured data across languages, specifically Wikipedia Infobox tables, by introducing a new dataset (InfoSyncC) and a two-step method for alignment and updating. The method achieves an F1 score of 87.91 for information alignment and a 77.28% acceptance rate for Wikipedia edits.

Information Synchronization of semi-structured data across languages is challenging. For instance, Wikipedia tables in one language should be synchronized across languages. To address this problem, we introduce a new dataset InfoSyncC and a two-step method for tabular synchronization. InfoSync contains 100K entity-centric tables (Wikipedia Infoboxes) across 14 languages, of which a subset (3.5K pairs) are manually annotated. The proposed method includes 1) Information Alignment to map rows and 2) Information Update for updating missing/outdated information for aligned tables across multilingual tables. When evaluated on InfoSync, information alignment achieves an F1 score of 87.91 (en <-> non-en). To evaluate information updation, we perform human-assisted Wikipedia edits on Infoboxes for 603 table pairs. Our approach obtains an acceptance rate of 77.28% on Wikipedia, showing the effectiveness of the proposed method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes