Data-driven Natural Language Generation: Paving the Road to Success
This addresses bottlenecks in commercializing statistical machine learning for natural language generation, but it is incremental as it builds on existing evaluation and data issues.
The paper tackles the lack of reliable automatic evaluation metrics and scarcity of high-quality in-domain corpora for natural language generation, proposing a new evaluation metric and a framework for corpus development.
We argue that there are currently two major bottlenecks to the commercial use of statistical machine learning approaches for natural language generation (NLG): (a) The lack of reliable automatic evaluation metrics for NLG, and (b) The scarcity of high quality in-domain corpora. We address the first problem by thoroughly analysing current evaluation metrics and motivating the need for a new, more reliable metric. The second problem is addressed by presenting a novel framework for developing and evaluating a high quality corpus for NLG training.