LGMLJun 4, 2020

MLE-guided parameter search for task loss minimization in neural sequence modeling

arXiv:2006.03158v29 citations
AI Analysis

This addresses the challenge of directly optimizing task losses in NLP sequence modeling, offering an incremental alternative to existing methods like policy gradient and minimum risk training.

The paper tackles the problem of optimizing sequence-level task losses in neural autoregressive models by introducing maximum likelihood guided parameter search (MGS), which shifts sampling to the parameter space and pools losses from multiple sequences, resulting in substantial reductions in repetition and non-termination in sequence completion and improvements comparable to minimum risk training in machine translation.

Neural autoregressive sequence models are used to generate sequences in a variety of natural language processing (NLP) tasks, where they are evaluated according to sequence-level task losses. These models are typically trained with maximum likelihood estimation, which ignores the task loss, yet empirically performs well as a surrogate objective. Typical approaches to directly optimizing the task loss such as policy gradient and minimum risk training are based around sampling in the sequence space to obtain candidate update directions that are scored based on the loss of a single sequence. In this paper, we develop an alternative method based on random search in the parameter space that leverages access to the maximum likelihood gradient. We propose maximum likelihood guided parameter search (MGS), which samples from a distribution over update directions that is a mixture of random search around the current parameters and around the maximum likelihood gradient, with each direction weighted by its improvement in the task loss. MGS shifts sampling to the parameter space, and scores candidates using losses that are pooled from multiple sequences. Our experiments show that MGS is capable of optimizing sequence-level losses, with substantial reductions in repetition and non-termination in sequence completion, and similar improvements to those of minimum risk training in machine translation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes