CLApr 3, 2024

MaiNLP at SemEval-2024 Task 1: Analyzing Source Language Selection in Cross-Lingual Textual Relatedness

arXiv:2404.02570v127 citationsh-index: 9SemEval
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of cross-lingual transfer for low-resource languages in NLP, but it is incremental as it builds on existing models and competition tasks.

The paper tackled the problem of zero-shot cross-lingual semantic textual relatedness by experimenting with source language selection strategies using pre-trained models, achieving first place on the Kinyarwanda test set in the SemEval-2024 competition.

This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness (STR), on Track C: Cross-lingual. The task aims to detect semantic relatedness of two sentences in a given target language without access to direct supervision (i.e. zero-shot cross-lingual transfer). To this end, we focus on different source language selection strategies on two different pre-trained languages models: XLM-R and Furina. We experiment with 1) single-source transfer and select source languages based on typological similarity, 2) augmenting English training data with the two nearest-neighbor source languages, and 3) multi-source transfer where we compare selecting on all training languages against languages from the same family. We further study machine translation-based data augmentation and the impact of script differences. Our submission achieved the first place in the C8 (Kinyarwanda) test set.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes