LG AI CLOct 3, 2022

A Non-monotonic Self-terminating Language Model

Eugene Choi, Kyunghyun Cho, Cheolhyoung Lee

arXiv:2210.00660v33.31 citationsh-index: 96Has Code

Originality Incremental advance

AI Analysis

This addresses a specific issue in natural language generation for researchers and practitioners, but it is incremental as it builds on prior work by Welleck et al. (2020).

The paper tackles the problem of non-terminating sequences in neural autoregressive language models when using decoding algorithms like greedy search and sampling, by proposing a non-monotonic self-terminating language model that relaxes monotonic termination constraints. The result is a model proven to prevent non-terminating sequences across various decoding algorithms and empirically validated on sequence completion tasks.

Recent large-scale neural autoregressive sequence models have shown impressive performances on a variety of natural language generation tasks. However, their generated sequences often exhibit degenerate properties such as non-termination, undesirable repetition, and premature termination, when generated with decoding algorithms such as greedy search, beam search, top-$k$ sampling, and nucleus sampling. In this paper, we focus on the problem of non-terminating sequences resulting from an incomplete decoding algorithm. We first define an incomplete probable decoding algorithm which includes greedy search, top-$k$ sampling, and nucleus sampling, beyond the incomplete decoding algorithm originally put forward by Welleck et al. (2020). We then propose a non-monotonic self-terminating language model, which significantly relaxes the constraint of monotonically increasing termination probability in the originally proposed self-terminating language model by Welleck et al. (2020), to address the issue of non-terminating sequences when using incomplete probable decoding algorithms. We prove that our proposed model prevents non-terminating sequences when using not only incomplete probable decoding algorithms but also beam search. We empirically validate our model on sequence completion tasks with various architectures.

View on arXiv PDF Code

Similar