CLJul 25, 2019

Cross-Lingual Transfer for Distantly Supervised and Low-resources Indonesian NER

arXiv:1907.11158v10.2

Originality Incremental advance

AI Analysis

This addresses NER for low-resource languages like Indonesian, but it is incremental as it builds on existing cross-lingual transfer methods.

The paper tackled the problem of improving named entity recognition (NER) for low-resource Indonesian by fine-tuning pre-trained language models from high-resource languages, achieving significant improvement for small gold corpora and competitive results for large silver corpora compared to supervised cross-lingual transfer.

Manually annotated corpora for low-resource languages are usually small in quantity (gold), or large but distantly supervised (silver). Inspired by recent progress of injecting pre-trained language model (LM) on many Natural Language Processing (NLP) task, we proposed to fine-tune pre-trained language model from high-resources languages to low-resources languages to improve the performance of both scenarios. Our empirical experiment demonstrates significant improvement when fine-tuning pre-trained language model in cross-lingual transfer scenarios for small gold corpus and competitive results in large silver compare to supervised cross-lingual transfer, which will be useful when there is no parallel annotation in the same task to begin. We compare our proposed method of cross-lingual transfer using pre-trained LM to different sources of transfer such as mono-lingual LM and Part-of-Speech tagging (POS) in the downstream task of both large silver and small gold NER dataset by exploiting character-level input of bi-directional language model task.

View on arXiv PDF

Similar