An Ensemble Method for Producing Word Representations focusing on the Greek Language
This work addresses the need for improved word embeddings specifically for the modern Greek language, representing an incremental advancement by combining existing methods.
The paper tackles the problem of producing high-quality word representations for modern Greek by introducing an ensemble method called Continuous Bag-of-Skip-grams (CBOS), which combines CBOW and Continuous Skip-gram approaches, and achieves state-of-the-art performance across intrinsic and extrinsic evaluation tasks on datasets including English Wikipedia, modern Greek Wikipedia, and modern Greek web content.
In this paper we present a new ensemble method, Continuous Bag-of-Skip-grams (CBOS), that produces high-quality word representations putting emphasis on the modern Greek language. The CBOS method combines the pioneering approaches for learning word representations: Continuous Bag-of-Words (CBOW) and Continuous Skip-gram. These methods are compared through intrinsic and extrinsic evaluation tasks on three different sources of data: the English Wikipedia corpus, the modern Greek Wikipedia corpus, and the modern Greek Web Content corpus. By comparing these methods across different tasks and datasets, it is evident that the CBOS method achieves state-of-the-art performance.