Ayan Roy

17.3CRApr 13

A Synthetic Conversational Smishing Dataset for Social Engineering Detection

Carl Lochstampfor, Ayan Roy

Smishing (SMS phishing) has become a serious cybersecurity threat, especially for elderly and cyber-unaware individuals, causing financial loss and undermining user trust. Although prior work has focused on detecting smishing at the level of individual messages, real-world attackers often rely on multi-stage social engineering, gradually manipulating victims through extended conversations before attempting to steal sensitive information. Despite the existence of several datasets for single-message smishing detection, datasets capturing conversational smishing remain largely unavailable, limiting research on multi-turn attack detection. To address this gap, this paper presents a synthetically generated dataset of 3,201 labeled multi-round conversations designed to emulate realistic conversational smishing attacks. The dataset reflects diverse attacker strategies and victim responses across multiple stages of interaction. Using this dataset, we establish baseline performance by evaluating eight models, including traditional machine learning approaches (Logistic Regression, Random Forest, Linear SVM, and XGBoost) and transformer-based architectures (DistilBERT and Longformer), with both engineered conversational features and TF-IDF text representations. Experimental results show that TF-IDF-based models consistently outperform those using engineered features alone. The best-performing model, XGBoost with TF-IDF features, achieves 72.5% accuracy and a macro F1 score of 0.691, surpassing both transformer models. Our analysis suggests that transformer performance is limited primarily by input-length constraints and the relatively small size of the training data. Overall, the results highlight the value of lexical signals in conversational smishing detection and demonstrate the usefulness of the proposed dataset for advancing research on defenses against multi-turn social engineering attacks.

CRSep 22, 2019

Secured Traffic Monitoring in VANET

Ayan Roy, Sanjay Madria

Vehicular Ad hoc Networks (VANETs) facilitate vehicles to wirelessly communicate with neighboring vehicles as well as with roadside units (RSUs). However, the existence of inaccurate information within the network can cause traffic aberrations and also disrupt the normal functioning of any traffic monitoring system. Thus, determining the credibility of broadcast messages originating from the region of interest (ROI) is crucial under a malicious environment. Additionally, a breach of privacy involving a vehicle's private information, such as location and velocity, can lead to severe consequences like unauthorized tracking and masquerading attack. Thus, we propose an edge cloud based privacy-preserving secured decision making model that employs a heuristic based on vehicular data such as GPS location and velocity to authenticate traffic-related information from the ROI under different traffic scenarios such as congestion. The effectiveness of the proposed model has been validated using VENTOS, SUMO, and Omnet++ simulators, and also by using a simulated cloud environment. We compare our proposed model to the existing peer-based authentication model, the majority voting model, and the reputation-based system under different attack scenarios. We show that our model is capable of filtering malicious vehicles effectively and provide accurate traffic information under the presence of at least one non-malicious vehicle within the ROI.

Ayan Roy

2 Papers