Sequence-to-Sequence Models for Data-to-Text Natural Language Generation: Word- vs. Character-based Processing and Output Diversity
This work addresses data-to-text generation for structured inputs, providing insights into input representations and output diversity, but it is incremental as it compares existing methods without introducing a new paradigm.
The paper compared word-based and character-based sequence-to-sequence models for data-to-text natural language generation, achieving comparable or better automatic evaluation results than best challenge submissions on two datasets, and demonstrated neural models' ability to generalize beyond training data by learning novel template combinations in a controlled experiment.
We present a comparison of word-based and character-based sequence-to-sequence models for data-to-text natural language generation, which generate natural language descriptions for structured inputs. On the datasets of two recent generation challenges, our models achieve comparable or better automatic evaluation results than the best challenge submissions. Subsequent detailed statistical and human analyses shed light on the differences between the two input representations and the diversity of the generated texts. In a controlled experiment with synthetic training data generated from templates, we demonstrate the ability of neural models to learn novel combinations of the templates and thereby generalize beyond the linguistic structures they were trained on.