Exploring Transformers in Natural Language Generation: GPT, BERT, and XLNet
This work provides a comparative analysis of key Transformer models, which is incremental as it synthesizes existing advancements rather than introducing new methods.
The paper explores Transformer-based models like GPT, BERT, and XLNet to address limitations in Natural Language Generation (NLG) such as vanishing gradients and lack of parallelization in older architectures, reporting that these models achieve groundbreaking results in tasks like poetry generation and summarization.
Recent years have seen a proliferation of attention mechanisms and the rise of Transformers in Natural Language Generation (NLG). Previously, state-of-the-art NLG architectures such as RNN and LSTM ran into vanishing gradient problems; as sentences grew larger, distance between positions remained linear, and sequential computation hindered parallelization since sentences were processed word by word. Transformers usher in a new era. In this paper, we explore three major Transformer-based models, namely GPT, BERT, and XLNet, that carry significant implications for the field. NLG is a burgeoning area that is now bolstered with rapid developments in attention mechanisms. From poetry generation to summarization, text generation derives benefit as Transformer-based language models achieve groundbreaking results.