CLApr 26

XITE: Cross-lingual Interpolation for Transfer using Embeddings

arXiv:2604.2358980.4

Predicted impact top 69% in CL · last 90 daysOriginality Incremental advance

AI Analysis

For practitioners of cross-lingual NLP, XITE offers a simple yet effective method to boost performance on low-resource languages without sacrificing high-resource language accuracy.

XITE proposes an embedding-based data augmentation technique that creates synthetic training data by interpolating embeddings of low-resource target language text with English counterparts, achieving up to 35.91% improvement in sentiment analysis and up to 81.16% in natural language inference across diverse languages using XLM-R.

Facilitating cross-lingual transfer in multilingual language models remains a critical challenge. Towards this goal, we propose an embedding-based data augmentation technique called XITE. We start with unlabeled text from a low-resource target language, identify an English counterpart in a task-specific training corpus using embedding-based similarities and adopt its label. Next, we perform a simple interpolation of the source and target embeddings to create synthetic data for task-specific fine-tuning. Projecting the target text into a language-rich subspace using linear discriminant analysis (LDA), prior to interpolation, further boosts performance. Our cross-lingual embedding-based augmentation technique XITE yields significant improvements of up to 35.91% for sentiment analysis and up to 81.16% for natural language inference, using XLM-R, for a diverse set of target languages including Korean, Arabic, Urdu and Hindi. Apart from boosting cross-lingual transfer, adaptation using XITE also safeguards against forgetting and maintains task performance on the high-resource language.

View on arXiv PDF

Similar