CLApr 17, 2014

The First Parallel Multilingual Corpus of Persian: Toward a Persian BLARK

arXiv:1404.4572v13 citations
Originality Synthesis-oriented
AI Analysis

This work addresses a resource gap for Persian language processing, but it is incremental as it builds on existing guidelines like EAGLE/MULTEXT.

The authors tackled the lack of parallel multilingual resources for Persian by creating the first parallel corpus with over 10 European languages, proposing a new Part-of-Speech categorization and orthography for digital use.

In this article, we have introduced the first parallel corpus of Persian with more than 10 other European languages. This article describes primary steps toward preparing a Basic Language Resources Kit (BLARK) for Persian. Up to now, we have proposed morphosyntactic specification of Persian based on EAGLE/MULTEXT guidelines and specific resources of MULTEXT-East. The article introduces Persian Language, with emphasis on its orthography and morphosyntactic features, then a new Part-of-Speech categorization and orthography for Persian in digital environments is proposed. Finally, the corpus and related statistic will be analyzed.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes