CLApr 14, 2020

Cross-Lingual Semantic Role Labeling with High-Quality Translated Training Corpus

arXiv:2004.06295v231.41012 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the challenge of SRL for low-resource languages, though it is incremental as it builds on existing cross-lingual approaches.

The paper tackles the problem of semantic role labeling (SRL) for low-resource languages by proposing a translation-based method to create training datasets from source annotations, achieving significant performance improvements on the Universal Proposition Bank.

Many efforts of research are devoted to semantic role labeling (SRL) which is crucial for natural language understanding. Supervised approaches have achieved impressing performances when large-scale corpora are available for resource-rich languages such as English. While for the low-resource languages with no annotated SRL dataset, it is still challenging to obtain competitive performances. Cross-lingual SRL is one promising way to address the problem, which has achieved great advances with the help of model transferring and annotation projection. In this paper, we propose a novel alternative based on corpus translation, constructing high-quality training datasets for the target languages from the source gold-standard SRL annotations. Experimental results on Universal Proposition Bank show that the translation-based method is highly effective, and the automatic pseudo datasets can improve the target-language SRL performances significantly.

View on arXiv PDF Code

Similar