A Computational Analysis of Natural Languages to Build a Sentence Structure Aware Artificial Neural Network
This work addresses the problem of language identification for computational linguistics, but it appears incremental as it builds on existing analyses of language similarities.
The paper tackled the problem of distinguishing languages by analyzing morphological aspects, specifically written patterns and sentence structure, and developed an Artificial Neural Network that uses sentence structure to identify languages, showing that grammatical structure alone suffices for this task.
Natural languages are complexly structured entities. They exhibit characterising regularities that can be exploited to link them one another. In this work, I compare two morphological aspects of languages: Written Patterns and Sentence Structure. I show how languages spontaneously group by similarity in both analyses and derive an average language distance. Finally, exploiting Sentence Structure I developed an Artificial Neural Network capable of distinguishing languages suggesting that not only word roots but also grammatical sentence structure is a characterising trait which alone suffice to identify them.