CLNov 4, 2020

MTLB-STRUCT @PARSEME 2020: Capturing Unseen Multiword Expressions Using Multi-task Learning and Pre-trained Masked Language Models

arXiv:2011.02541v1992 citations
AI Analysis

This work addresses the challenge of capturing multiword expressions in natural language processing, particularly for multilingual applications, though it is incremental as it builds on existing BERT and multi-task learning approaches.

The paper tackled the problem of identifying verbal multiword expressions (VMWEs), including unseen ones, by developing a semi-supervised system that jointly learns VMWE tagging and dependency parsing using pre-trained multilingual BERT. The system achieved first place in the PARSEME shared task 2020, with top F1-scores for both unseen and general VMWEs averaged across 14 languages.

This paper describes a semi-supervised system that jointly learns verbal multiword expressions (VMWEs) and dependency parse trees as an auxiliary task. The model benefits from pre-trained multilingual BERT. BERT hidden layers are shared among the two tasks and we introduce an additional linear layer to retrieve VMWE tags. The dependency parse tree prediction is modelled by a linear layer and a bilinear one plus a tree CRF on top of BERT. The system has participated in the open track of the PARSEME shared task 2020 and ranked first in terms of F1-score in identifying unseen VMWEs as well as VMWEs in general, averaged across all 14 languages.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes