Automatic Identification of Motivation for Code-Switching in Speech Transcripts
This addresses the need for automated analysis of code-switching motivations in multilingual speech, which has linguistic and cultural implications, but is incremental as it applies existing methods to a new dataset.
The paper tackles the problem of automatically identifying motivations for code-switching in speech transcripts by annotating a new Spanish-English dataset and building the first system for this task, achieving 75% accuracy on Spanish-English and 66% on Hindi-English.
Code-switching, or switching between languages, occurs for many reasons and has important linguistic, sociological, and cultural implications. Multilingual speakers code-switch for a variety of purposes, such as expressing emotions, borrowing terms, making jokes, introducing a new topic, etc. The reason for code-switching may be quite useful for analysis, but is not readily apparent. To remedy this situation, we annotate a new dataset of motivations for code-switching in Spanish-English. We build the first system (to our knowledge) to automatically identify a wide range of motivations that speakers code-switch in everyday speech, achieving an accuracy of 75% across all motivations. Additionally, we show that the system can be adapted to new language pairs, achieving 66% accuracy on a new language pair (Hindi-English), demonstrating the cross-lingual applicability of our annotation scheme