CLApr 22, 2023

Romanian Multiword Expression Detection Using Multilingual Adversarial Training and Lateral Inhibition

Andrei-Marius Avram, Verginica Barbu Mititelu, Dumitru-Clementin Cercel

arXiv:2304.11350v227.9263 citationsh-index: 15

Originality Synthesis-oriented

AI Analysis

This work addresses a domain-specific problem for natural language processing in Romanian, representing an incremental improvement over previous methods.

The paper tackled the problem of automatically identifying Romanian multiword expressions, improving the F1-score of XLM-RoBERTa by approximately 2.7% on unseen expressions and achieving state-of-the-art performance on the PARSEME v1.2 shared task corpus.

Multiword expressions are a key ingredient for developing large-scale and linguistically sound natural language processing technology. This paper describes our improvements in automatically identifying Romanian multiword expressions on the corpus released for the PARSEME v1.2 shared task. Our approach assumes a multilingual perspective based on the recently introduced lateral inhibition layer and adversarial training to boost the performance of the employed multilingual language models. With the help of these two methods, we improve the F1-score of XLM-RoBERTa by approximately 2.7% on unseen multiword expressions, the main task of the PARSEME 1.2 edition. In addition, our results can be considered SOTA performance, as they outperform the previous results on Romanian obtained by the participants in this competition.

View on arXiv PDF

Similar