CLMay 5, 2018

Exploring Hyper-Parameter Optimization for Neural Machine Translation on GPU Architectures

arXiv:1805.02094v27 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of efficiently tuning hyper-parameters for NMT systems, which is crucial for developers and researchers aiming to optimize translation models on modern hardware, though it appears incremental as it builds on existing methods like Marian NMT.

This research tackled the problem of hyper-parameter optimization for neural machine translation on GPU architectures by exploring various settings and comparing performance metrics like words per second and translation accuracy, revealing insights into which hyper-parameters most impact system performance.

Neural machine translation (NMT) has been accelerated by deep learning neural networks over statistical-based approaches, due to the plethora and programmability of commodity heterogeneous computing architectures such as FPGAs and GPUs and the massive amount of training corpuses generated from news outlets, government agencies and social media. Training a learning classifier for neural networks entails tuning hyper-parameters that would yield the best performance. Unfortunately, the number of parameters for machine translation include discrete categories as well as continuous options, which makes for a combinatorial explosive problem. This research explores optimizing hyper-parameters when training deep learning neural networks for machine translation. Specifically, our work investigates training a language model with Marian NMT. Results compare NMT under various hyper-parameter settings across a variety of modern GPU architecture generations in single node and multi-node settings, revealing insights on which hyper-parameters matter most in terms of performance, such as words processed per second, convergence rates, and translation accuracy, and provides insights on how to best achieve high-performing NMT systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes