CRLGMar 28, 2020

Real-Time Detection of Dictionary DGA Network Traffic using Deep Learning

arXiv:2003.12805v155 citations
Originality Incremental advance
AI Analysis

This addresses a critical security issue for large enterprises, particularly in finance, by improving real-time detection of stealthy malware communication, though it is an incremental advance in DGA detection methods.

The paper tackled the problem of detecting botnet and malware communication using dictionary-based domain generation algorithms (DGAs) that evade static detection, by developing a hybrid neural network called Bilbo that combines CNN and LSTM; it achieved consistent performance in AUC, F1 score, and accuracy across tasks and discovered five potential command-and-control networks in real-world traffic that commercial tools missed.

Botnets and malware continue to avoid detection by static rules engines when using domain generation algorithms (DGAs) for callouts to unique, dynamically generated web addresses. Common DGA detection techniques fail to reliably detect DGA variants that combine random dictionary words to create domain names that closely mirror legitimate domains. To combat this, we created a novel hybrid neural network, Bilbo the `bagging` model, that analyses domains and scores the likelihood they are generated by such algorithms and therefore are potentially malicious. Bilbo is the first parallel usage of a convolutional neural network (CNN) and a long short-term memory (LSTM) network for DGA detection. Our unique architecture is found to be the most consistent in performance in terms of AUC, F1 score, and accuracy when generalising across different dictionary DGA classification tasks compared to current state-of-the-art deep learning architectures. We validate using reverse-engineered dictionary DGA domains and detail our real-time implementation strategy for scoring real-world network logs within a large financial enterprise. In four hours of actual network traffic, the model discovered at least five potential command-and-control networks that commercial vendor tools did not flag.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes