Ontology Guided Information Extraction from Unstructured Text
This work addresses the challenge of enhancing semantic querying for domain-specific applications by structuring unstructured text, though it appears incremental in its approach.
The paper tackles the problem of populating existing ontologies with instance information from unstructured text, achieving 95% accuracy in information extraction by using heuristics to extract semantic triples and convert them into RDF.
In this paper, we describe an approach to populate an existing ontology with instance information present in the natural language text provided as input. An ontology is defined as an explicit conceptualization of a shared domain. This approach starts with a list of relevant domain ontologies created by human experts, and techniques for identifying the most appropriate ontology to be extended with information from a given text. Then we demonstrate heuristics to extract information from the unstructured text and for adding it as structured information to the selected ontology. This identification of the relevant ontology is critical, as it is used in identifying relevant information in the text. We extract information in the form of semantic triples from the text, guided by the concepts in the ontology. We then convert the extracted information about the semantic class instances into Resource Description Framework (RDF3) and append it to the existing domain ontology. This enables us to perform more precise semantic queries over the semantic triple store thus created. We have achieved 95% accuracy of information extraction in our implementation.