CLFeb 24, 2021

Task-Specific Pre-Training and Cross Lingual Transfer for Code-Switched Data

Akshat Gupta, Sai Krishna Rallabandi, Alan Black

arXiv:2102.12407v11.615 citationsh-index: 58

Originality Synthesis-oriented

AI Analysis

This work addresses sentiment analysis for low-resource code-switched languages like Tamil-English and Malayalam-English, but it is incremental as it compares existing techniques without introducing new methods.

The paper tackled sentiment analysis for code-switched Dravidian languages by comparing task-specific pre-training and cross-lingual transfer, finding that task-specific pre-training achieved superior zero-shot and supervised performance over cross-lingual transfer from multilingual BERT models.

Using task-specific pre-training and leveraging cross-lingual transfer are two of the most popular ways to handle code-switched data. In this paper, we aim to compare the effects of both for the task of sentiment analysis. We work with two Dravidian Code-Switched languages - Tamil-Engish and Malayalam-English and four different BERT based models. We compare the effects of task-specific pre-training and cross-lingual transfer and find that task-specific pre-training results in superior zero-shot and supervised performance when compared to performance achieved by leveraging cross-lingual transfer from multilingual BERT models.

View on arXiv PDF

Similar