CLLGNEJul 8, 2019

An Intrinsic Nearest Neighbor Analysis of Neural Machine Translation Architectures

arXiv:1907.03885v11093 citations
Originality Synthesis-oriented
AI Analysis

This work provides insights for researchers in machine translation by offering an intrinsic analysis method to compare model architectures, though it is incremental as it builds on prior extrinsic evaluations.

The paper tackled the problem of analyzing how neural machine translation models capture lexical semantics and syntactic structures by comparing transformer and recurrent architectures using an intrinsic nearest neighbor approach, finding that transformers are superior in lexical semantics but not necessarily in syntax, and that backward recurrent layers focus more on word semantics while forward layers encode more context.

Earlier approaches indirectly studied the information captured by the hidden states of recurrent and non-recurrent neural machine translation models by feeding them into different classifiers. In this paper, we look at the encoder hidden states of both transformer and recurrent machine translation models from the nearest neighbors perspective. We investigate to what extent the nearest neighbors share information with the underlying word embeddings as well as related WordNet entries. Additionally, we study the underlying syntactic structure of the nearest neighbors to shed light on the role of syntactic similarities in bringing the neighbors together. We compare transformer and recurrent models in a more intrinsic way in terms of capturing lexical semantics and syntactic structures, in contrast to extrinsic approaches used by previous works. In agreement with the extrinsic evaluations in the earlier works, our experimental results show that transformers are superior in capturing lexical semantics, but not necessarily better in capturing the underlying syntax. Additionally, we show that the backward recurrent layer in a recurrent model learns more about the semantics of words, whereas the forward recurrent layer encodes more context.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes