CLAINEAug 5, 2021

EENLP: Cross-lingual Eastern European NLP Index

arXiv:2108.02605v3585 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of limited NLP resources for Eastern European languages, providing a foundational tool for researchers and practitioners in this domain, though it is incremental as it compiles existing resources and adds new datasets.

The authors tackled the scarcity of NLP resources for Eastern European languages by creating a comprehensive index of over 90 datasets and 45 models, and developed cross-lingual datasets for five semantic tasks to establish performance baselines.

Motivated by the sparsity of NLP resources for Eastern European languages, we present a broad index of existing Eastern European language resources (90+ datasets and 45+ models) published as a github repository open for updates from the community. Furthermore, to support the evaluation of commonsense reasoning tasks, we provide hand-crafted cross-lingual datasets for five different semantic tasks (namely news categorization, paraphrase detection, Natural Language Inference (NLI) task, tweet sentiment detection, and news sentiment detection) for some of the Eastern European languages. We perform several experiments with the existing multilingual models on these datasets to define the performance baselines and compare them to the existing results for other languages.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes