CLAILGNov 6, 2018

Parser Extraction of Triples in Unstructured Text

arXiv:1811.05768v17 citations
Originality Incremental advance
AI Analysis

This improves knowledge graph construction for semantic web applications and question answering, though it appears incremental as it builds on existing parsing methods.

The paper tackled the problem of extracting subject-predicate-object triples from unstructured web text to build knowledge graphs, achieving 2-2.5 times more correct extractions than the ReVerb baseline on the ClueWeb dataset.

The web contains vast repositories of unstructured text. We investigate the opportunity for building a knowledge graph from these text sources. We generate a set of triples which can be used in knowledge gathering and integration. We define the architecture of a language compiler for processing subject-predicate-object triples using the OpenNLP parser. We implement a depth-first search traversal on the POS tagged syntactic tree appending predicate and object information. A parser enables higher precision and higher recall extractions of syntactic relationships across conjunction boundaries. We are able to extract 2-2.5 times the correct extractions of ReVerb. The extractions are used in a variety of semantic web applications and question answering. We verify extraction of 50,000 triples on the ClueWeb dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes