CL CRNov 5, 2024

LLMs for Domain Generation Algorithm Detection

Reynier Leyva La O, Carlos A. Catania, Tatiana Parlanti

arXiv:2411.03307v14 citationsh-index: 4

Originality Synthesis-oriented

AI Analysis

This addresses malware detection for cybersecurity practitioners, though it appears incremental as it applies existing LLM techniques to a specific domain.

This work tackles domain generation algorithm (DGA) detection using large language models (LLMs), showing that supervised fine-tuning (SFT) with domain-specific data achieves 94% accuracy and 4% false positive rate, outperforming state-of-the-art attention-based models.

This work analyzes the use of large language models (LLMs) for detecting domain generation algorithms (DGAs). We perform a detailed evaluation of two important techniques: In-Context Learning (ICL) and Supervised Fine-Tuning (SFT), showing how they can improve detection. SFT increases performance by using domain-specific data, whereas ICL helps the detection model to quickly adapt to new threats without requiring much retraining. We use Meta's Llama3 8B model, on a custom dataset with 68 malware families and normal domains, covering several hard-to-detect schemes, including recent word-based DGAs. Results proved that LLM-based methods can achieve competitive results in DGA detection. In particular, the SFT-based LLM DGA detector outperforms state-of-the-art models using attention layers, achieving 94% accuracy with a 4% false positive rate (FPR) and excelling at detecting word-based DGA domains.

View on arXiv PDF

Similar