CLLGMLJan 25, 2020

An Analysis of Word2Vec for the Italian Language

arXiv:2001.09332v1
AI Analysis

This work addresses the problem of limited NLP resources for non-English languages like Italian, but it is incremental as it applies an existing method to new data.

The authors tackled the lack of word embeddings for Italian by producing an embedding using Word2Vec, exploring parameters like epochs and context window to analyze its semantic capacity.

Word representation is fundamental in NLP tasks, because it is precisely from the coding of semantic closeness between words that it is possible to think of teaching a machine to understand text. Despite the spread of word embedding concepts, still few are the achievements in linguistic contexts other than English. In this work, analysing the semantic capacity of the Word2Vec algorithm, an embedding for the Italian language is produced. Parameter setting such as the number of epochs, the size of the context window and the number of negatively backpropagated samples is explored.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes