CLAIOct 14, 2020

A Graph Representation of Semi-structured Data for Web Question Answering

arXiv:2010.06801v1994 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving question answering accuracy for commercial search engines by better utilizing structured web data, representing an incremental advancement in the field.

The paper tackles the problem of leveraging semantic information in web tables and lists for question answering by proposing a novel graph representation and associated techniques, resulting in a 3.90-point F1 score improvement over state-of-the-art baselines.

The abundant semi-structured data on the Web, such as HTML-based tables and lists, provide commercial search engines a rich information source for question answering (QA). Different from plain text passages in Web documents, Web tables and lists have inherent structures, which carry semantic correlations among various elements in tables and lists. Many existing studies treat tables and lists as flat documents with pieces of text and do not make good use of semantic information hidden in structures. In this paper, we propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations. We also develop pre-training and reasoning techniques on the graph model for the QA task. Extensive experiments on several real datasets collected from a commercial engine verify the effectiveness of our approach. Our method improves F1 score by 3.90 points over the state-of-the-art baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes