CLMay 16, 2018

Semantic Relatedness for All (Languages): A Comparative Analysis of Multilingual Semantic Relatedness Using Machine Translation

Andre Freitas, Siamak Barzegar, Juliano Efson Sales, Siegfried Handschuh, Brian Davis

arXiv:1805.06522v115 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of semantic relatedness across multiple languages, but it is incremental as it applies existing methods to new data.

The paper tackled the problem of evaluating multilingual semantic relatedness by comparing native language-specific models with machine translation approaches, finding that using machine translation over English-based models significantly improved performance by an average of 16.7% in Spearman correlation.

This paper provides a comparative analysis of the performance of four state-of-the-art distributional semantic models (DSMs) over 11 languages, contrasting the native language-specific models with the use of machine translation over English-based DSMs. The experimental results show that there is a significant improvement (average of 16.7% for the Spearman correlation) by using state-of-the-art machine translation approaches. The results also show that the benefit of using the most informative corpus outweighs the possible errors introduced by the machine translation. For all languages, the combination of machine translation over the Word2Vec English distributional model provided the best results consistently (average Spearman correlation of 0.68).

View on arXiv PDF

Similar