A Survey of Code-switched Speech and Language Processing
It addresses the need for building intelligent systems in multilingual communities, but is incremental as it synthesizes existing research.
This survey reviews computational approaches for processing code-switched speech and natural language, highlighting the scarcity of data and resources while listing available datasets and applications.
Code-switching, the alternation of languages within a conversation or utterance, is a common communicative phenomenon that occurs in multilingual communities across the world. This survey reviews computational approaches for code-switched Speech and Natural Language Processing. We motivate why processing code-switched text and speech is essential for building intelligent agents and systems that interact with users in multilingual communities. As code-switching data and resources are scarce, we list what is available in various code-switched language pairs with the language processing tasks they can be used for. We review code-switching research in various Speech and NLP applications, including language processing tools and end-to-end systems. We conclude with future directions and open problems in the field.