CLNov 13, 2019

Prevalence of code mixing in semi-formal patient communication in low resource languages of South Africa

Monika Obrocka, Charles Copley, Themba Gqaza, Eli Grant

arXiv:1911.05636v34 citations

Originality Synthesis-oriented

AI Analysis

This addresses the issue of code-mixing in patient communication for public health services in low-resource languages, but it is incremental as it applies an existing method to new data.

The paper tackled the problem of code-mixing in low-resource language settings by analyzing 182k patient questions from a South African health platform, finding approximately 10% code-switching that could challenge future services.

In this paper we address the problem of code-mixing in resource-poor language settings. We examine data consisting of 182k unique questions generated by users of the MomConnect helpdesk, part of a national scale public health platform in South Africa. We show evidence of code-switching at the level of approximately 10% within this dataset -- a level that is likely to pose challenges for future services. We use a natural language processing library (Polyglot) that supports detection of 196 languages and attempt to evaluate its performance at identifying English, isiZulu and code-mixed questions.

View on arXiv PDF

Similar