CLOct 31, 2018

Generating Texts with Integer Linear Programming

Gerasimos Lampouras, Ion Androutsopoulos

arXiv:1811.00051v10.2

Originality Highly original

AI Analysis

This work addresses the need for efficient text generation in applications like web advertisements or summaries with space constraints, offering a novel optimization approach over traditional pipelines.

The paper tackles the problem of suboptimal text generation in pipeline architectures by proposing an Integer Linear Programming model that jointly optimizes content selection, lexicalization, and sentence aggregation to produce more compact texts, reporting more facts per word, with experiments showing improvements in compactness and quality.

Concept-to-text generation typically employs a pipeline architecture, which often leads to suboptimal texts. Content selection, for example, may greedily select the most important facts, which may require, however, too many words to express, and this may be undesirable when space is limited or expensive. Selecting other facts, possibly only slightly less important, may allow the lexicalization stage to use much fewer words, or to report more facts in the same space. Decisions made during content selection and lexicalization may also lead to more or fewer sentence aggregation opportunities, affecting the length and readability of the resulting texts. Building upon on a publicly available state of the art natural language generator for Semantic Web ontologies, this article presents an Integer Linear Programming model that, unlike pipeline architectures, jointly considers choices available in content selection, lexicalization, and sentence aggregation to avoid greedy local decisions and produce more compact texts, i.e., texts that report more facts per word. Compact texts are desirable, for example, when generating advertisements to be included in Web search results, or when summarizing structured information in limited space. An extended version of the proposed model also considers a limited form of referring expression generation and avoids redundant sentences. An approximation of the two models can be used when longer texts need to be generated. Experiments with three ontologies confirm that the proposed models lead to more compact texts, compared to pipeline systems, with no deterioration or with improvements in the perceived quality of the generated texts.

View on arXiv PDF

Similar