CRAILGNEAug 12, 2025

Enhance the machine learning algorithm performance in phishing detection with keyword features

arXiv:2508.09765v1CNIOT
Originality Incremental advance
AI Analysis

This work addresses phishing attacks for internet users by improving detection accuracy, though it is incremental as it builds on existing machine learning methods.

The paper tackled phishing URL detection by enhancing machine learning algorithms through a novel feature selection method that combines keyword features with traditional ones, achieving a 30% reduction in classification error on average and up to 99.68% accuracy.

Recently, we can observe a significant increase of the phishing attacks in the Internet. In a typical phishing attack, the attacker sets up a malicious website that looks similar to the legitimate website in order to obtain the end-users' information. This may cause the leakage of the sensitive information and the financial loss for the end-users. To avoid such attacks, the early detection of these websites' URLs is vital and necessary. Previous researchers have proposed many machine learning algorithms to distinguish the phishing URLs from the legitimate ones. In this paper, we would like to enhance these machine learning algorithms from the perspective of feature selection. We propose a novel method to incorporate the keyword features with the traditional features. This method is applied on multiple traditional machine learning algorithms and the experimental results have shown this method is useful and effective. On average, this method can reduce the classification error by 30% for the large dataset. Moreover, its enhancement is more significant for the small dataset. In addition, this method extracts the information from the URL and does not rely on the additional information provided by the third-part service. The best result for the machine learning algorithm using our proposed method has achieved the accuracy of 99.68%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes