LGJun 1

Segment-driven Structural Induction and Semantic Alignment for Heterogeneous Tabular Representation

arXiv:2606.0189037.2
AI Analysis

For practitioners working with heterogeneous tables, NAVI offers a pretraining framework that better handles varying headers and shared semantics, though improvements are shown only on in-domain data.

NAVI improves heterogeneous tabular representation by treating header-value pairs as units for structural and distributional evidence, achieving better reconstruction, semantic consistency, and downstream utility on in-domain tables.

Real-world domains often contain heterogeneous tables whose headers vary while their underlying attribute semantics are shared, making it difficult to induce domain-specialized semantics from table-local evidence alone. Existing encoders model parts of this problem, but often underuse column-level value distributions and apply uniform objectives across attributes with different semantic roles. We propose NAVI, a segment-centric pretraining framework that treats each header-value pair as the unit for aggregating schema-level structural evidence and column-level distributional evidence. We realize this design through Masked Segment Modeling and Entropy-driven Segment Alignment, which jointly enforce structured header-value coupling and semantic alignment across stable and instance-specific attributes. Experiments on heterogeneous in-domain tables show improved reconstruction, semantic consistency, and downstream utility across evaluation settings overall.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes