GARI: Graph Attention for Relative Isomorphism of Arabic Word Embeddings
This addresses a core NLP challenge for Arabic language processing, but it is incremental as it builds on existing methods by incorporating semantic variations.
The paper tackles the problem of bilingual lexical induction by improving the relative isomorphism of Arabic word embeddings, resulting in a relative improvement of up to 40.95% and 76.80% in average P@1 for in-domain and domain mismatch settings.
Bilingual Lexical Induction (BLI) is a core challenge in NLP, it relies on the relative isomorphism of individual embedding spaces. Existing attempts aimed at controlling the relative isomorphism of different embedding spaces fail to incorporate the impact of semantically related words in the model training objective. To address this, we propose GARI that combines the distributional training objectives with multiple isomorphism losses guided by the graph attention network. GARI considers the impact of semantical variations of words in order to define the relative isomorphism of the embedding spaces. Experimental evaluation using the Arabic language data set shows that GARI outperforms the existing research by improving the average P@1 by a relative score of up to 40.95% and 76.80% for in-domain and domain mismatch settings respectively. We release the codes for GARI at https://github.com/asif6827/GARI.