A Multi-cascaded Deep Model for Bilingual SMS Classification
This work addresses the problem of bilingual SMS classification for applications in public services, but it is incremental as it builds on existing deep learning approaches for text classification.
The authors tackled the challenge of classifying bilingual SMS texts, which are multilingual, informal, and noisy, by proposing a multi-cascaded deep learning model called McM that learns from n-gram and long-term dependencies without external knowledge. The model achieved high accuracy on a 12-class bilingual dataset of Roman Urdu and English SMS, outperforming previous multilingual text classification methods.
Most studies on text classification are focused on the English language. However, short texts such as SMS are influenced by regional languages. This makes the automatic text classification task challenging due to the multilingual, informal, and noisy nature of language in the text. In this work, we propose a novel multi-cascaded deep learning model called McM for bilingual SMS classification. McM exploits $n$-gram level information as well as long-term dependencies of text for learning. Our approach aims to learn a model without any code-switching indication, lexical normalization, language translation, or language transliteration. The model relies entirely upon the text as no external knowledge base is utilized for learning. For this purpose, a 12 class bilingual text dataset is developed from SMS feedbacks of citizens on public services containing mixed Roman Urdu and English languages. Our model achieves high accuracy for classification on this dataset and outperforms the previous model for multilingual text classification, highlighting language independence of McM.