CLApr 18, 2014

Challenges in Persian Electronic Text Analysis

arXiv:1404.4740v16 citations
Originality Synthesis-oriented
AI Analysis

This addresses problems for researchers and developers working with Persian language data, but it is incremental as it focuses on known issues without introducing new solutions.

The paper identifies key challenges in analyzing Persian electronic texts, particularly in transcription and encoding during corpus development, highlighting their crucial impact on processing written corpora.

Farsi, also known as Persian, is the official language of Iran and Tajikistan and one of the two main languages spoken in Afghanistan. Farsi enjoys a unified Arabic script as its writing system. In this paper we briefly introduce the writing standards of Farsi and highlight problems one would face when analyzing Farsi electronic texts, especially during development of Farsi corpora regarding to transcription and encoding of Farsi e-texts. The pointes mentioned may sounds easy but they are crucial when developing and processing written corpora of Farsi.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes