CLOct 31, 2018

Extracting Linguistic Resources from the Web for Concept-to-Text Generation

Gerasimos Lampouras, Ion Androutsopoulos

arXiv:1810.13414v10.22 citations

Originality Incremental advance

AI Analysis

This work addresses the tedious and costly manual resource construction for concept-to-text generation systems, offering a semi-automatic solution that is incremental in improving efficiency for developers and users in natural language generation domains.

The paper tackled the problem of manually constructing domain-specific linguistic resources for concept-to-text generation by proposing methods to extract sentence plans and natural language names from the Web, specifically for the NaturalOWL generator. Experiments showed that texts generated with these semi-automatically extracted resources were perceived as almost as good as those using manually authored resources and much better than those using ontology identifiers.

Many concept-to-text generation systems require domain-specific linguistic resources to produce high quality texts, but manually constructing these resources can be tedious and costly. Focusing on NaturalOWL, a publicly available state of the art natural language generator for OWL ontologies, we propose methods to extract from the Web sentence plans and natural language names, two of the most important types of domain-specific linguistic resources used by the generator. Experiments show that texts generated using linguistic resources extracted by our methods in a semi-automatic manner, with minimal human involvement, are perceived as being almost as good as texts generated using manually authored linguistic resources, and much better than texts produced by using linguistic resources extracted from the relation and entity identifiers of the ontology.

View on arXiv PDF

Similar