CLOct 16, 2020

Substance over Style: Document-Level Targeted Content Transfer

arXiv:2010.08618v1993 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the challenge of making language generation more attuned to substantive constraints rather than stylistic ones, though it is incremental as it focuses on a specific domain.

The paper tackles the problem of rewriting entire documents to fit targeted constraints, specifically in the recipe domain with dietary restrictions, and shows that their model outperforms existing methods by generating coherent and diverse rewrites that adhere to constraints while staying close to the original content.

Existing language models excel at writing from scratch, but many real-world scenarios require rewriting an existing document to fit a set of constraints. Although sentence-level rewriting has been fairly well-studied, little work has addressed the challenge of rewriting an entire document coherently. In this work, we introduce the task of document-level targeted content transfer and address it in the recipe domain, with a recipe as the document and a dietary restriction (such as vegan or dairy-free) as the targeted constraint. We propose a novel model for this task based on the generative pre-trained language model (GPT-2) and train on a large number of roughly-aligned recipe pairs (https://github.com/microsoft/document-level-targeted-content-transfer). Both automatic and human evaluations show that our model out-performs existing methods by generating coherent and diverse rewrites that obey the constraint while remaining close to the original document. Finally, we analyze our model's rewrites to assess progress toward the goal of making language generation more attuned to constraints that are substantive rather than stylistic.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes