CLMar 22, 2019

Pre-trained Language Model Representations for Language Generation

arXiv:1903.09722v21200 citations
Originality Incremental advance
AI Analysis

This addresses improving language generation tasks like translation and summarization for NLP practitioners, but it is incremental as it builds on existing pre-trained models.

The paper tackles integrating pre-trained language model representations into sequence-to-sequence models for neural machine translation and abstractive summarization, finding that adding them to the encoder yields gains of up to 5.3 BLEU in resource-poor setups and achieves a new state of the art on CNN/DailyMail summarization.

Pre-trained language model representations have been successful in a wide range of language understanding tasks. In this paper, we examine different strategies to integrate pre-trained representations into sequence to sequence models and apply it to neural machine translation and abstractive summarization. We find that pre-trained representations are most effective when added to the encoder network which slows inference by only 14%. Our experiments in machine translation show gains of up to 5.3 BLEU in a simulated resource-poor setup. While returns diminish with more labeled data, we still observe improvements when millions of sentence-pairs are available. Finally, on abstractive summarization we achieve a new state of the art on the full text version of CNN/DailyMail.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes