DIS-NN CL NE SOC-PHJul 15, 2015

Language discrimination and clustering via a neural network approach

Angelo Mariano, Giorgio Parisi, Saverio Pascazio

arXiv:1507.04116v11.2

Originality Synthesis-oriented

AI Analysis

This work addresses language classification for computational linguistics, but it is incremental as it applies existing neural network methods to a specific dataset.

The paper tackled the problem of classifying twenty-one Indo-European languages from written text using neural networks to define distances and construct dendrograms, identifying four or five subgroups based on an entropic criterion.

We classify twenty-one Indo-European languages starting from written text. We use neural networks in order to define a distance among different languages, construct a dendrogram and analyze the ultrametric structure that emerges. Four or five subgroups of languages are identified, according to the "cut" of the dendrogram, drawn with an entropic criterion. The results and the method are discussed.

View on arXiv PDF

Similar