CLMay 20, 2018

Generating High-Quality Surface Realizations Using Data Augmentation and Factored Sequence Models

arXiv:1805.07731v11092 citations
Originality Incremental advance
AI Analysis

This work improves text generation for natural language processing applications, but it is incremental as it builds on existing shared tasks and methods.

The paper tackled the problem of generating high-quality surface realizations from obfuscated text by addressing data scarcity through synthetic data generation and preprocessing techniques, achieving first place on all evaluation metrics in the English portion of the 2018 Surface Realization shared task.

This work presents a new state of the art in reconstruction of surface realizations from obfuscated text. We identify the lack of sufficient training data as the major obstacle to training high-performing models, and solve this issue by generating large amounts of synthetic training data. We also propose preprocessing techniques which make the structure contained in the input features more accessible to sequence models. Our models were ranked first on all evaluation metrics in the English portion of the 2018 Surface Realization shared task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes