SECLApr 27, 2021

Shellcode_IA32: A Dataset for Automatic Shellcode Generation

arXiv:2104.13100v4717 citations
Originality Synthesis-oriented
AI Analysis

This addresses the need for automated shellcode generation in cybersecurity, but it is incremental as it focuses on dataset creation and baseline methods.

The authors tackled the problem of automatically generating shellcode from natural language comments by creating and releasing the Shellcode_IA32 dataset, and they established baseline performance using standard neural machine translation methods.

We take the first step to address the task of automatically generating shellcodes, i.e., small pieces of code used as a payload in the exploitation of a software vulnerability, starting from natural language comments. We assemble and release a novel dataset (Shellcode_IA32), consisting of challenging but common assembly instructions with their natural language descriptions. We experiment with standard methods in neural machine translation (NMT) to establish baseline performance levels on this task.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes