CLApr 3, 2024

MaiNLP at SemEval-2024 Task 1: Analyzing Source Language Selection in Cross-Lingual Textual Relatedness

Shijia Zhou, Huangyan Shan, Barbara Plank, Robert Litschko

arXiv:2404.02570v114.427 citationsh-index: 9SemEval

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of cross-lingual transfer for low-resource languages in NLP, but it is incremental as it builds on existing models and competition tasks.

The paper tackled the problem of zero-shot cross-lingual semantic textual relatedness by experimenting with source language selection strategies using pre-trained models, achieving first place on the Kinyarwanda test set in the SemEval-2024 competition.

This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness (STR), on Track C: Cross-lingual. The task aims to detect semantic relatedness of two sentences in a given target language without access to direct supervision (i.e. zero-shot cross-lingual transfer). To this end, we focus on different source language selection strategies on two different pre-trained languages models: XLM-R and Furina. We experiment with 1) single-source transfer and select source languages based on typological similarity, 2) augmenting English training data with the two nearest-neighbor source languages, and 3) multi-source transfer where we compare selecting on all training languages against languages from the same family. We further study machine translation-based data augmentation and the impact of script differences. Our submission achieved the first place in the C8 (Kinyarwanda) test set.

View on arXiv PDF

Similar