CLApr 6, 2021

ODE Transformer: An Ordinary Differential Equation-Inspired Model for Neural Machine Translation

arXiv:2104.02308v125 citations
Originality Incremental advance
AI Analysis

This work addresses performance improvements in machine translation for researchers and practitioners, though it is incremental as it builds on the Transformer architecture.

The paper tackles the problem of improving neural machine translation by proposing ODE Transformer, a new architecture inspired by numerical methods for ordinary differential equations, which achieves state-of-the-art BLEU scores of 30.76 on WMT'14 En-De and 44.11 on WMT'14 En-Fr.

It has been found that residual networks are an Euler discretization of solutions to Ordinary Differential Equations (ODEs). In this paper, we explore a deeper relationship between Transformer and numerical methods of ODEs. We show that a residual block of layers in Transformer can be described as a higher-order solution to ODEs. This leads us to design a new architecture (call it ODE Transformer) analogous to the Runge-Kutta method that is well motivated in ODEs. As a natural extension to Transformer, ODE Transformer is easy to implement and parameter efficient. Our experiments on three WMT tasks demonstrate the genericity of this model, and large improvements in performance over several strong baselines. It achieves 30.76 and 44.11 BLEU scores on the WMT'14 En-De and En-Fr test data. This sets a new state-of-the-art on the WMT'14 En-Fr task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes