Counterfactual Learning for Machine Translation: Degeneracies and Solutions
This work addresses reliability challenges in offline learning for web-based translation services, but it appears incremental as it builds on prior methods without introducing new paradigms.
The paper analyzes degeneracies in counterfactual learning estimators for machine translation, focusing on inverse and reweighted propensity scoring in stochastic and deterministic logging policies, and relates these issues to existing techniques.
Counterfactual learning is a natural scenario to improve web-based machine translation services by offline learning from feedback logged during user interactions. In order to avoid the risk of showing inferior translations to users, in such scenarios mostly exploration-free deterministic logging policies are in place. We analyze possible degeneracies of inverse and reweighted propensity scoring estimators, in stochastic and deterministic settings, and relate them to recently proposed techniques for counterfactual learning under deterministic logging.