Neural Machine Translation For Paraphrase Generation
This work addresses the high cost of data annotation for skill developers in voice assistants like Alexa, though it is incremental as it applies an existing method to a new domain.
The authors tackled the problem of expensive manual annotation for spoken language understanding systems by developing an automatic paraphrase generation system using a neural machine translation approach, which improved intent and named entity classification accuracy and sentence coverage for unseen skills.
Training a spoken language understanding system, as the one in Alexa, typically requires a large human-annotated corpus of data. Manual annotations are expensive and time consuming. In Alexa Skill Kit (ASK) user experience with the skill greatly depends on the amount of data provided by skill developer. In this work, we present an automatic natural language generation system, capable of generating both human-like interactions and annotations by the means of paraphrasing. Our approach consists of machine translation (MT) inspired encoder-decoder deep recurrent neural network. We evaluate our model on the impact it has on ASK skill, intent, named entity classification accuracy and sentence level coverage, all of which demonstrate significant improvements for unseen skills on natural language understanding (NLU) models, trained on the data augmented with paraphrases.