Cross-lingual Extended Named Entity Classification of Wikipedia Articles
This work addresses the challenge of classifying named entities across multiple languages for applications in multilingual information extraction, though it is incremental as it builds on existing methods for a specific task.
The paper tackled the problem of cross-lingual named entity classification in Wikipedia articles by proposing a three-stage approach involving multilingual pre-training, monolingual fine-tuning, and cross-lingual voting, achieving the best scores for 25 out of 30 languages with small accuracy gaps in the remaining five.
The FPT.AI team participated in the SHINRA2020-ML subtask of the NTCIR-15 SHINRA task. This paper describes our method to solving the problem and discusses the official results. Our method focuses on learning cross-lingual representations, both on the word level and document level for page classification. We propose a three-stage approach including multilingual model pre-training, monolingual model fine-tuning and cross-lingual voting. Our system is able to achieve the best scores for 25 out of 30 languages; and its accuracy gaps to the best performing systems of the other five languages are relatively small.