IRFeb 17, 2018

TabVec: Table Vectors for Classification of Web Tables

arXiv:1802.06290v137 citations
Originality Highly original
AI Analysis

This addresses the challenge of leveraging information from hundreds of millions of web tables with varied structures for applications, offering a more efficient solution than existing methods that need significant domain-specific training.

The paper tackles the problem of classifying diverse web tables into categories (entity, relational, matrix, list, non-data) by introducing TabVec, an unsupervised method that embeds tables into a vector space using syntax, semantics, and structure, resulting in over 20% accuracy improvement compared to three state-of-the-art systems without requiring domain annotations.

There are hundreds of millions of tables in Web pages that contain useful information for many applications. Leveraging data within these tables is difficult because of the wide variety of structures, formats and data encoded in these tables. TabVec is an unsupervised method to embed tables into a vector space to support classification of tables into categories (entity, relational, matrix, list, and non-data) with minimal user intervention. TabVec deploys syntax and semantics of table cells, and embeds the structure of tables in a table vector space. This enables superior classification of tables even in the absence of domain annotations. Our evaluations in four real world domains show that TabVec improves classification accuracy by more than 20% compared to three state of the art systems, and that those systems require significant in domain training to achieve good results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes