SEAISep 23, 2025

Reverse Engineering User Stories from Code using Large Language Models

arXiv:2509.19587v11 citationsh-index: 1CASCON
Originality Incremental advance
AI Analysis

This addresses the issue of missing or outdated user stories in agile development for legacy systems, though it is incremental as it builds on existing LLM capabilities.

The paper tackled the problem of automatically recovering user stories from source code using large language models, achieving an average F1 score of 0.8 for code up to 200 NLOC and showing that a small 8B model can match a 70B model with a single example.

User stories are essential in agile development, yet often missing or outdated in legacy and poorly documented systems. We investigate whether large language models (LLMs) can automatically recover user stories directly from source code and how prompt design impacts output quality. Using 1,750 annotated C++ snippets of varying complexity, we evaluate five state-of-the-art LLMs across six prompting strategies. Results show that all models achieve, on average, an F1 score of 0.8 for code up to 200 NLOC. Our findings show that a single illustrative example enables the smallest model (8B) to match the performance of a much larger 70B model. In contrast, structured reasoning via Chain-of-Thought offers only marginal gains, primarily for larger models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes