CRAINov 16, 2025

SeedAIchemy: LLM-Driven Seed Corpus Generation for Fuzzing

arXiv:2511.12448v11 citations
Originality Incremental advance
AI Analysis

This tool makes it easier for developers to implement fuzzing effectively, addressing a domain-specific problem in software testing.

The authors tackled the problem of generating high-quality seed corpora for fuzzing by introducing SeedAIchemy, an automated LLM-driven tool that uses five modules to collect publicly available files from the internet. The result showed that corpora generated by SeedAIchemy performed significantly better than a naive corpus and similarly to a manually-curated corpus across diverse target programs and libraries.

We introduce SeedAIchemy, an automated LLM-driven corpus generation tool that makes it easier for developers to implement fuzzing effectively. SeedAIchemy consists of five modules which implement different approaches at collecting publicly available files from the internet. Four of the five modules use large language model (LLM) workflows to construct search terms designed to maximize corpus quality. Corpora generated by SeedAIchemy perform significantly better than a naive corpus and similarly to a manually-curated corpus on a diverse range of target programs and libraries.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes