CRLGOct 15, 2022

Classification of Web Phishing Kits for early detection by platform providers

arXiv:2210.08273v13 citationsh-index: 119
Originality Incremental advance
AI Analysis

This work addresses the need for Web service providers to improve early detection of phishing kits, though it is an incremental step in cybersecurity.

The authors tackled the problem of classifying phishing kits for early detection by analyzing over 2000 kits to identify evasion and obfuscation functions, and they demonstrated that supervised machine learning models could effectively classify even novel kits with limited training data.

Phishing kits are tools that dark side experts provide to the community of criminal phishers to facilitate the construction of malicious Web sites. As these kits evolve in sophistication, providers of Web-based services need to keep pace with continuous complexity. We present an original classification of a corpus of over 2000 recent phishing kits according to their adopted evasion and obfuscation functions. We carry out an initial deterministic analysis of the source code of the kits to extract the most discriminant features and information about their principal authors. We then integrate this initial classification through supervised machine learning models. Thanks to the ground-truth achieved in the first step, we can demonstrate whether and which machine learning models are able to suitably classify even the kits adopting novel evasion and obfuscation techniques that were unseen during the training phase. We compare different algorithms and evaluate their robustness in the realistic case in which only a small number of phishing kits are available for training. This paper represents an initial but important step to support Web service providers and analysts in improving early detection mechanisms and intelligence operations for the phishing kits that might be installed on their platforms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes