Generating High-Quality Surface Realizations Using Data Augmentation and Factored Sequence Models
This work improves text generation for natural language processing applications, but it is incremental as it builds on existing shared tasks and methods.
The paper tackled the problem of generating high-quality surface realizations from obfuscated text by addressing data scarcity through synthetic data generation and preprocessing techniques, achieving first place on all evaluation metrics in the English portion of the 2018 Surface Realization shared task.
This work presents a new state of the art in reconstruction of surface realizations from obfuscated text. We identify the lack of sufficient training data as the major obstacle to training high-performing models, and solve this issue by generating large amounts of synthetic training data. We also propose preprocessing techniques which make the structure contained in the input features more accessible to sequence models. Our models were ranked first on all evaluation metrics in the English portion of the 2018 Surface Realization shared task.