Minimum Risk Training for Neural Machine Translation
This addresses the challenge of optimizing non-differentiable metrics in machine translation, offering a method that is incremental but transparent to architectures and potentially beneficial for broader NLP tasks.
The paper tackles the problem of training neural machine translation models by proposing minimum risk training, which directly optimizes parameters for arbitrary evaluation metrics, achieving significant improvements over maximum likelihood estimation across various language pairs.
We propose minimum risk training for end-to-end neural machine translation. Unlike conventional maximum likelihood estimation, minimum risk training is capable of optimizing model parameters directly with respect to arbitrary evaluation metrics, which are not necessarily differentiable. Experiments show that our approach achieves significant improvements over maximum likelihood estimation on a state-of-the-art neural machine translation system across various languages pairs. Transparent to architectures, our approach can be applied to more neural networks and potentially benefit more NLP tasks.