LG AI CRDec 7, 2025

Deep Reinforcement Learning for Phishing Detection with Transformer-Based Semantic Features

arXiv:2512.06925v1

Originality Incremental advance

AI Analysis

It addresses phishing threats for cybersecurity applications, offering a robust and adaptive detection method with incremental improvements in generalization.

This study tackled phishing detection by proposing a Quantile Regression Deep Q-Network (QR-DQN) that integrates RoBERTa semantic embeddings with lexical features, achieving a test accuracy of 99.86% and reducing the generalization gap from 1.66% to 0.04% compared to standard DQN.

Phishing is a cybercrime in which individuals are deceived into revealing personal information, often resulting in financial loss. These attacks commonly occur through fraudulent messages, misleading advertisements, and compromised legitimate websites. This study proposes a Quantile Regression Deep Q-Network (QR-DQN) approach that integrates RoBERTa semantic embeddings with handcrafted lexical features to enhance phishing detection while accounting for uncertainties. Unlike traditional DQN methods that estimate single scalar Q-values, QR-DQN leverages quantile regression to model the distribution of returns, improving stability and generalization on unseen phishing data. A diverse dataset of 105,000 URLs was curated from PhishTank, OpenPhish, Cloudflare, and other sources, and the model was evaluated using an 80/20 train-test split. The QR-DQN framework achieved a test accuracy of 99.86%, precision of 99.75%, recall of 99.96%, and F1-score of 99.85%, demonstrating high effectiveness. Compared to standard DQN with lexical features, the hybrid QR-DQN with lexical and semantic features reduced the generalization gap from 1.66% to 0.04%, indicating significant improvement in robustness. Five-fold cross-validation confirmed model reliability, yielding a mean accuracy of 99.90% with a standard deviation of 0.04%. These results suggest that the proposed hybrid approach effectively identifies phishing threats, adapts to evolving attack strategies, and generalizes well to unseen data.

View on arXiv PDF

Similar