LG CR NE SI SPMar 31, 2020

Deep Learning based Frameworks for Handling Imbalance in DGA, Email, and URL Data Analysis

Simran K, Prathiksha Balakrishna, Vinayakumar Ravi, Soman KP

arXiv:2004.04812v21.2

Originality Synthesis-oriented

AI Analysis

This addresses data imbalance issues for cyber security applications, but it is incremental as it applies an existing cost-sensitive approach to deep learning in specific domains.

The paper tackled the problem of imbalanced data in cyber security by proposing cost-sensitive deep learning frameworks, which outperformed cost-insensitive methods across DGA, email, and URL analysis use cases.

Deep learning is a state of the art method for a lot of applications. The main issue is that most of the real-time data is highly imbalanced in nature. In order to avoid bias in training, cost-sensitive approach can be used. In this paper, we propose cost-sensitive deep learning based frameworks and the performance of the frameworks is evaluated on three different Cyber Security use cases which are Domain Generation Algorithm (DGA), Electronic mail (Email), and Uniform Resource Locator (URL). Various experiments were performed using cost-insensitive as well as cost-sensitive methods and parameters for both of these methods are set based on hyperparameter tuning. In all experiments, the cost-sensitive deep learning methods performed better than the cost-insensitive approaches. This is mainly due to the reason that cost-sensitive approach gives importance to the classes which have a very less number of samples during training and this helps to learn all the classes in a more efficient manner.

View on arXiv PDF

Similar