SEAIApr 1, 2025

Leveraging LLMs for User Stories in AI Systems: UStAI Dataset

arXiv:2504.00513v312 citationsh-index: 10Has CodePROMISE
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of limited research artifacts for AI system requirements, offering a dataset and method to aid in early requirements elicitation, though it is incremental in applying LLMs to a specific domain.

The paper tackles the lack of open-source requirements for AI systems by investigating the use of LLMs to generate user stories from scholarly paper abstracts, resulting in a dataset of 1260 user stories from 42 abstracts across 26 domains, with quality assessed using the QUS framework.

AI systems are gaining widespread adoption across various sectors and domains. Creating high-quality AI system requirements is crucial for aligning the AI system with business goals and consumer values and for social responsibility. However, with the uncertain nature of AI systems and the heavy reliance on sensitive data, more research is needed to address the elicitation and analysis of AI systems requirements. With the proprietary nature of many AI systems, there is a lack of open-source requirements artifacts and technical requirements documents for AI systems, limiting broader research and investigation. With Large Language Models (LLMs) emerging as a promising alternative to human-generated text, this paper investigates the potential use of LLMs to generate user stories for AI systems based on abstracts from scholarly papers. We conducted an empirical evaluation using three LLMs and generated $1260$ user stories from $42$ abstracts from $26$ domains. We assess their quality using the Quality User Story (QUS) framework. Moreover, we identify relevant non-functional requirements (NFRs) and ethical principles. Our analysis demonstrates that the investigated LLMs can generate user stories inspired by the needs of various stakeholders, offering a promising approach for generating user stories for research purposes and for aiding in the early requirements elicitation phase of AI systems. We have compiled and curated a collection of stories generated by various LLMs into a dataset (UStAI), which is now publicly available for use.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes