CLDec 14, 2022

Quotations, Coreference Resolution, and Sentiment Annotations in Croatian News Articles: An Exploratory Study

Jelena Sarajlić, Gaurish Thakkar, Diego Alves, Nives Mikelic Preradović

arXiv:2212.07172v10.3h-index: 8

Originality Synthesis-oriented

AI Analysis

This provides a resource for NLP tasks in Croatian, but it is incremental as it adapts existing annotation methods to a new language.

The paper tackles the problem of direct-speech extraction in Croatian by creating an annotated corpus for quotations, coreference resolution, and sentiment analysis, identifying language-specific differences compared to English and deriving phenomena requiring special attention.

This paper presents a corpus annotated for the task of direct-speech extraction in Croatian. The paper focuses on the annotation of the quotation, co-reference resolution, and sentiment annotation in SETimes news corpus in Croatian and on the analysis of its language-specific differences compared to English. From this, a list of the phenomena that require special attention when performing these annotations is derived. The generated corpus with quotation features annotations can be used for multiple tasks in the field of Natural Language Processing.

View on arXiv PDF

Similar