CLLGSIFeb 3, 2025

Hybrid Machine Learning Model for Detecting Bangla Smishing Text Using BERT and Character-Level CNN

arXiv:2502.01518v111 citationsh-index: 3ICECE
Originality Incremental advance
AI Analysis

This addresses the growing threat of smishing attacks, which have surged by 328% and caused over $54.2 million in losses, for mobile users in Bangla-speaking regions, though it is an incremental improvement over existing methods.

The paper tackles the problem of detecting smishing (SMS phishing) attacks in Bangla text by developing a hybrid model that combines BERT and character-level CNNs, achieving 98.47% accuracy in multi-class classification.

Smishing is a social engineering attack using SMS containing malicious content to deceive individuals into disclosing sensitive information or transferring money to cybercriminals. Smishing attacks have surged by 328%, posing a major threat to mobile users, with losses exceeding \$54.2 million in 2019. Despite its growing prevalence, the issue remains significantly under-addressed. This paper presents a novel hybrid machine learning model for detecting Bangla smishing texts, combining Bidirectional Encoder Representations from Transformers (BERT) with Convolutional Neural Networks (CNNs) for enhanced character-level analysis. Our model addresses multi-class classification by distinguishing between Normal, Promotional, and Smishing SMS. Unlike traditional binary classification methods, our approach integrates BERT's contextual embeddings with CNN's character-level features, improving detection accuracy. Enhanced by an attention mechanism, the model effectively prioritizes crucial text segments. Our model achieves 98.47% accuracy, outperforming traditional classifiers, with high precision and recall in Smishing detection, and strong performance across all categories.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes