CLJun 30, 2022

Domain Adaptive Pretraining for Multilingual Acronym Extraction

arXiv:2206.15221v10.35 citationsh-index: 20

Originality Synthesis-oriented

AI Analysis

This work addresses acronym extraction for multilingual scientific and legal domains, but it is incremental as it applies an existing method with domain adaptation to a new shared task.

The paper tackled multilingual acronym extraction from scientific and legal documents in 6 languages by using a BiLSTM-CRF model with domain-adapted XLM-RoBERTa embeddings, achieving competitive performance across all languages.

This paper presents our findings from participating in the multilingual acronym extraction shared task SDU@AAAI-22. The task consists of acronym extraction from documents in 6 languages within scientific and legal domains. To address multilingual acronym extraction we employed BiLSTM-CRF with multilingual XLM-RoBERTa embeddings. We pretrained the XLM-RoBERTa model on the shared task corpus to further adapt XLM-RoBERTa embeddings to the shared task domain(s). Our system (team: SMR-NLP) achieved competitive performance for acronym extraction across all the languages.

View on arXiv PDF

Similar