CRAIMay 30, 2025

A Reward-driven Automated Webshell Malicious-code Generator for Red-teaming

arXiv:2505.24252v11 citationsh-index: 1Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of limited malicious-code datasets for red-teaming in network security, though it is incremental as it builds on existing LLM and reinforcement learning techniques.

The paper tackled the shortage of diverse and effective webshell malicious-code datasets by proposing RAWG, a reward-driven generator that uses fine-tuned LLMs and reinforcement learning, achieving significant improvements in payload diversity and escape effectiveness over state-of-the-art methods.

Frequent cyber-attacks have elevated WebShell exploitation and defense to a critical research focus within network security. However, there remains a significant shortage of publicly available, well-categorized malicious-code datasets organized by obfuscation method. Existing malicious-code generation methods, which primarily rely on prompt engineering, often suffer from limited diversity and high redundancy in the payloads they produce. To address these limitations, we propose \textbf{RAWG}, a \textbf{R}eward-driven \textbf{A}utomated \textbf{W}ebshell Malicious-code \textbf{G}enerator designed for red-teaming applications. Our approach begins by categorizing webshell samples from common datasets into seven distinct types of obfuscation. We then employ a large language model (LLM) to extract and normalize key tokens from each sample, creating a standardized, high-quality corpus. Using this curated dataset, we perform supervised fine-tuning (SFT) on an open-source large model to enable the generation of diverse, highly obfuscated webshell malicious payloads. To further enhance generation quality, we apply Proximal Policy Optimization (PPO), treating malicious-code samples as "chosen" data and benign code as "rejected" data during reinforcement learning. Extensive experiments demonstrate that RAWG significantly outperforms current state-of-the-art methods in both payload diversity and escape effectiveness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes