CLMar 11, 2017

Extending Automatic Discourse Segmentation for Texts in Spanish to Catalan

arXiv:1703.04718v13 citations
Originality Synthesis-oriented
AI Analysis

This work addresses a gap in NLP tools for Catalan language processing, but it is incremental as it extends existing methods from Spanish.

The authors tackled the lack of a discourse segmenter for Catalan by developing the first such tool, based on adapting Spanish RST rules using lexical and syntactic information, and reported promising results evaluated on a manually segmented corpus.

At present, automatic discourse analysis is a relevant research topic in the field of NLP. However, discourse is one of the phenomena most difficult to process. Although discourse parsers have been already developed for several languages, this tool does not exist for Catalan. In order to implement this kind of parser, the first step is to develop a discourse segmenter. In this article we present the first discourse segmenter for texts in Catalan. This segmenter is based on Rhetorical Structure Theory (RST) for Spanish, and uses lexical and syntactic information to translate rules valid for Spanish into rules for Catalan. We have evaluated the system by using a gold standard corpus including manually segmented texts and results are promising.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes