Catching the Phish: Detecting Phishing Attacks using Recurrent Neural Networks (RNNs)
This addresses the need for automated detection of phishing content to protect users from malicious emails, though it is incremental as it builds on existing machine learning techniques.
The paper tackled the problem of detecting phishing attacks in emails by developing a classifier based on recurrent neural networks (RNNs) that identifies features without human input, and it outperformed state-of-the-art tools.
The emergence of online services in our daily lives has been accompanied by a range of malicious attempts to trick individuals into performing undesired actions, often to the benefit of the adversary. The most popular medium of these attempts is phishing attacks, particularly through emails and websites. In order to defend against such attacks, there is an urgent need for automated mechanisms to identify this malevolent content before it reaches users. Machine learning techniques have gradually become the standard for such classification problems. However, identifying common measurable features of phishing content (e.g., in emails) is notoriously difficult. To address this problem, we engage in a novel study into a phishing content classifier based on a recurrent neural network (RNN), which identifies such features without human input. At this stage, we scope our research to emails, but our approach can be extended to apply to websites. Our results show that the proposed system outperforms state-of-the-art tools. Furthermore, our classifier is efficient and takes into account only the text and, in particular, the textual structure of the email. Since these features are rarely considered in email classification, we argue that our classifier can complement existing classifiers with high information gain.