CL LG SIFeb 3, 2025

Hybrid Machine Learning Model for Detecting Bangla Smishing Text Using BERT and Character-Level CNN

Gazi Tanbhir, Md. Farhan Shahriyar, Khandker Shahed, Abdullah Md Raihan Chy, Md Al Adnan

arXiv:2502.01518v14.911 citationsh-index: 3ICECE

Originality Incremental advance

AI Analysis

This addresses the growing threat of smishing attacks, which have surged by 328% and caused over $54.2 million in losses, for mobile users in Bangla-speaking regions, though it is an incremental improvement over existing methods.

The paper tackles the problem of detecting smishing (SMS phishing) attacks in Bangla text by developing a hybrid model that combines BERT and character-level CNNs, achieving 98.47% accuracy in multi-class classification.

Smishing is a social engineering attack using SMS containing malicious content to deceive individuals into disclosing sensitive information or transferring money to cybercriminals. Smishing attacks have surged by 328%, posing a major threat to mobile users, with losses exceeding \$54.2 million in 2019. Despite its growing prevalence, the issue remains significantly under-addressed. This paper presents a novel hybrid machine learning model for detecting Bangla smishing texts, combining Bidirectional Encoder Representations from Transformers (BERT) with Convolutional Neural Networks (CNNs) for enhanced character-level analysis. Our model addresses multi-class classification by distinguishing between Normal, Promotional, and Smishing SMS. Unlike traditional binary classification methods, our approach integrates BERT's contextual embeddings with CNN's character-level features, improving detection accuracy. Enhanced by an attention mechanism, the model effectively prioritizes crucial text segments. Our model achieves 98.47% accuracy, outperforming traditional classifiers, with high precision and recall in Smishing detection, and strong performance across all categories.

View on arXiv PDF

Similar