Word Embeddings for the Armenian Language: Intrinsic and Extrinsic Evaluation
This work addresses the need for language-specific resources in Armenian NLP, though it is incremental as it applies existing methods to a new language.
The authors tackled the problem of evaluating and comparing word embedding models for the Armenian language, resulting in new embeddings trained with algorithms like GloVe and fastText, and establishing benchmarks for tasks such as morphological tagging and text classification with publicly released datasets.
In this work, we intrinsically and extrinsically evaluate and compare existing word embedding models for the Armenian language. Alongside, new embeddings are presented, trained using GloVe, fastText, CBOW, SkipGram algorithms. We adapt and use the word analogy task in intrinsic evaluation of embeddings. For extrinsic evaluation, two tasks are employed: morphological tagging and text classification. Tagging is performed on a deep neural network, using ArmTDP v2.3 dataset. For text classification, we propose a corpus of news articles categorized into 7 classes. The datasets are made public to serve as benchmarks for future models.