CL LGMar 17, 2025

Harnessing Test-time Adaptation for NLU tasks Involving Dialects of English

arXiv:2503.12858v21 citationsh-index: 2

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of scarce dialectal datasets in NLP, enabling better generalization for tasks involving non-standard English dialects, though it is incremental as it applies an existing method to a new domain.

The paper tackled the problem of adapting natural language understanding models to English dialects without labeled data, using test-time domain adaptation (SHOT) and showing it is viable, with a positive correlation between dialectal gap and effectiveness, and finding that finetuning on Standard American English often outperforms dialectal finetuning.

Test-time domain adaptation (TTDA) is an excellent method which helps generalize models across domains, tasks, and distributions without the use of labeled datasets. Thus, TTDA is very useful in natural language processing (NLP) in the dialectal setting, since oftentimes, models are trained on Standard American English (SAE), evaluated on Indian English (IndE), Singaporean English (SingE), or Nigerian English (NgE), of which distribution differs significantly from the former. This is especially useful since dialectal datasets are scarce. In this paper, we explore one of the most famous TTDA techniques, SHOT, in dialectal NLP. We finetune and evaluate SHOT on different combinations of dialectal GLUE. Our findings show that SHOT is a viable technique when labeled datasets are unavailable. We also theoretically propose the concept of dialectal gap and show that it has a positive correlation with the effectiveness of SHOT. We also find that in many cases, finetuning on SAE yields higher performance than finetuning on dialectal data.

View on arXiv PDF

Similar