CLApr 5, 2019

PoMo: Generating Entity-Specific Post-Modifiers in Context

Jun Seok Kang, Robert L. Logan, Zewei Chu, Yang Chen, Dheeru Dua, Kevin Gimpel, Sameer Singh, Niranjan Balasubramanian

arXiv:1904.03111v231.01095 citationsh-index: 67

Originality Incremental advance

AI Analysis

This addresses a collaborative writing problem for journalism by automating the inclusion of relevant entity information in news articles, though it is incremental as it adapts existing generation methods.

The paper tackles the task of generating entity-specific post-modifier phrases in context, such as adding 'a father of two girls' to a sentence about Barack Obama, by introducing the PoMo dataset with over 231K sentences and showing that modeling contextual relevance leads to a >20% improvement in BLEU score.

We introduce entity post-modifier generation as an instance of a collaborative writing task. Given a sentence about a target entity, the task is to automatically generate a post-modifier phrase that provides contextually relevant information about the entity. For example, for the sentence, "Barack Obama, _______, supported the #MeToo movement.", the phrase "a father of two girls" is a contextually relevant post-modifier. To this end, we build PoMo, a post-modifier dataset created automatically from news articles reflecting a journalistic need for incorporating entity information that is relevant to a particular news event. PoMo consists of more than 231K sentences with post-modifiers and associated facts extracted from Wikidata for around 57K unique entities. We use crowdsourcing to show that modeling contextual relevance is necessary for accurate post-modifier generation. We adapt a number of existing generation approaches as baselines for this dataset. Our results show there is large room for improvement in terms of both identifying relevant facts to include (knowing which claims are relevant gives a >20% improvement in BLEU score), and generating appropriate post-modifier text for the context (providing relevant claims is not sufficient for accurate generation). We conduct an error analysis that suggests promising directions for future research.

View on arXiv PDF

Similar