Cross-Lingual Semantic Role Labeling with High-Quality Translated Training Corpus
This addresses the challenge of SRL for low-resource languages, though it is incremental as it builds on existing cross-lingual approaches.
The paper tackles the problem of semantic role labeling (SRL) for low-resource languages by proposing a translation-based method to create training datasets from source annotations, achieving significant performance improvements on the Universal Proposition Bank.
Many efforts of research are devoted to semantic role labeling (SRL) which is crucial for natural language understanding. Supervised approaches have achieved impressing performances when large-scale corpora are available for resource-rich languages such as English. While for the low-resource languages with no annotated SRL dataset, it is still challenging to obtain competitive performances. Cross-lingual SRL is one promising way to address the problem, which has achieved great advances with the help of model transferring and annotation projection. In this paper, we propose a novel alternative based on corpus translation, constructing high-quality training datasets for the target languages from the source gold-standard SRL annotations. Experimental results on Universal Proposition Bank show that the translation-based method is highly effective, and the automatic pseudo datasets can improve the target-language SRL performances significantly.