Prevalence of code mixing in semi-formal patient communication in low resource languages of South Africa
This addresses the issue of code-mixing in patient communication for public health services in low-resource languages, but it is incremental as it applies an existing method to new data.
The paper tackled the problem of code-mixing in low-resource language settings by analyzing 182k patient questions from a South African health platform, finding approximately 10% code-switching that could challenge future services.
In this paper we address the problem of code-mixing in resource-poor language settings. We examine data consisting of 182k unique questions generated by users of the MomConnect helpdesk, part of a national scale public health platform in South Africa. We show evidence of code-switching at the level of approximately 10% within this dataset -- a level that is likely to pose challenges for future services. We use a natural language processing library (Polyglot) that supports detection of 196 languages and attempt to evaluate its performance at identifying English, isiZulu and code-mixed questions.