CLMar 22, 2019

Pre-trained Language Model Representations for Language Generation

Sergey Edunov, Alexei Baevski, Michael Auli

arXiv:1903.09722v232.11200 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses improving language generation tasks like translation and summarization for NLP practitioners, but it is incremental as it builds on existing pre-trained models.

The paper tackles integrating pre-trained language model representations into sequence-to-sequence models for neural machine translation and abstractive summarization, finding that adding them to the encoder yields gains of up to 5.3 BLEU in resource-poor setups and achieves a new state of the art on CNN/DailyMail summarization.

Pre-trained language model representations have been successful in a wide range of language understanding tasks. In this paper, we examine different strategies to integrate pre-trained representations into sequence to sequence models and apply it to neural machine translation and abstractive summarization. We find that pre-trained representations are most effective when added to the encoder network which slows inference by only 14%. Our experiments in machine translation show gains of up to 5.3 BLEU in a simulated resource-poor setup. While returns diminish with more labeled data, we still observe improvements when millions of sentence-pairs are available. Finally, on abstractive summarization we achieve a new state of the art on the full text version of CNN/DailyMail.

View on arXiv PDF Code

Similar