Domain Adaptive Pretraining for Multilingual Acronym Extraction
This work addresses acronym extraction for multilingual scientific and legal domains, but it is incremental as it applies an existing method with domain adaptation to a new shared task.
The paper tackled multilingual acronym extraction from scientific and legal documents in 6 languages by using a BiLSTM-CRF model with domain-adapted XLM-RoBERTa embeddings, achieving competitive performance across all languages.
This paper presents our findings from participating in the multilingual acronym extraction shared task SDU@AAAI-22. The task consists of acronym extraction from documents in 6 languages within scientific and legal domains. To address multilingual acronym extraction we employed BiLSTM-CRF with multilingual XLM-RoBERTa embeddings. We pretrained the XLM-RoBERTa model on the shared task corpus to further adapt XLM-RoBERTa embeddings to the shared task domain(s). Our system (team: SMR-NLP) achieved competitive performance for acronym extraction across all the languages.