CLMay 18, 2023

Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction

arXiv:2305.10985v1250 citations
Originality Synthesis-oriented
AI Analysis

This provides a multi-lingual dataset for researchers in NLP, but it is incremental as it builds on an existing dataset through machine translation.

The authors tackled the lack of multi-lingual resources in Relation Extraction by creating Multi-CrossRE, a dataset covering 26 languages and six domains, with results showing consistent performance on back-translated data indicating high translation quality.

Most research in Relation Extraction (RE) involves the English language, mainly due to the lack of multi-lingual resources. We propose Multi-CrossRE, the broadest multi-lingual dataset for RE, including 26 languages in addition to English, and covering six text domains. Multi-CrossRE is a machine translated version of CrossRE (Bassignana and Plank, 2022), with a sub-portion including more than 200 sentences in seven diverse languages checked by native speakers. We run a baseline model over the 26 new datasets and--as sanity check--over the 26 back-translations to English. Results on the back-translated data are consistent with the ones on the original English CrossRE, indicating high quality of the translation and the resulting dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes