CLApr 10, 2017

Automatic semantic role labeling on non-revised syntactic trees of journalistic texts

Nathan Siegle Hartmann, Magali Sanches Duran, Sandra Maria Aluísio

arXiv:1704.03016v10.79 citations

Originality Synthesis-oriented

AI Analysis

This work addresses a practical application issue for NLP researchers and practitioners working with Brazilian Portuguese, though it is incremental as it builds on existing SRL studies.

The paper tackled the problem of semantic role labeling (SRL) on non-revised syntactic trees in Brazilian Portuguese journalistic texts, showing that a system trained on non-revised trees outperforms one trained on gold-standard data in this realistic scenario.

Semantic Role Labeling (SRL) is a Natural Language Processing task that enables the detection of events described in sentences and the participants of these events. For Brazilian Portuguese (BP), there are two studies recently concluded that perform SRL in journalistic texts. [1] obtained F1-measure scores of 79.6, using the PropBank.Br corpus, which has syntactic trees manually revised, [8], without using a treebank for training, obtained F1-measure scores of 68.0 for the same corpus. However, the use of manually revised syntactic trees for this task does not represent a real scenario of application. The goal of this paper is to evaluate the performance of SRL on revised and non-revised syntactic trees using a larger and balanced corpus of BP journalistic texts. First, we have shown that [1]'s system also performs better than [8]'s system on the larger corpus. Second, the SRL system trained on non-revised syntactic trees performs better over non-revised trees than a system trained on gold-standard data.

View on arXiv PDF

Similar