CLSep 13, 2021

On Language Models for Creoles

arXiv:2109.06074v1663 citations
Originality Incremental advance
AI Analysis

This work provides resources and insights for NLP researchers and practitioners working with under-resourced creole languages, though it is incremental in its methodological approach.

This paper addresses the under-resourcing of creole languages in NLP by collecting corpora and releasing models for Haitian Creole, Nigerian Pidgin English, and Singaporean Colloquial English, and finds that standard language models outperform distributionally robust ones on evaluation tasks, with analysis suggesting this is due to the relative stability of creoles rather than over-parameterization.

Creole languages such as Nigerian Pidgin English and Haitian Creole are under-resourced and largely ignored in the NLP literature. Creoles typically result from the fusion of a foreign language with multiple local languages, and what grammatical and lexical features are transferred to the creole is a complex process. While creoles are generally stable, the prominence of some features may be much stronger with certain demographics or in some linguistic situations. This paper makes several contributions: We collect existing corpora and release models for Haitian Creole, Nigerian Pidgin English, and Singaporean Colloquial English. We evaluate these models on intrinsic and extrinsic tasks. Motivated by the above literature, we compare standard language models with distributionally robust ones and find that, somewhat surprisingly, the standard language models are superior to the distributionally robust ones. We investigate whether this is an effect of over-parameterization or relative distributional stability, and find that the difference persists in the absence of over-parameterization, and that drift is limited, confirming the relative stability of creole languages.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes