SE AI HCMay 12, 2025

A Case Study Investigating the Role of Generative AI in Quality Evaluations of Epics in Agile Software Development

Werner Geyer, Jessica He, Daita Sarkar, Michelle Brachman, Chris Hammond, Jennifer Heins, Zahra Ashktorab, Carlos Rosemberg, Charlie Hill

arXiv:2505.07664v15.92 citationsh-index: 34CHIWORK

Originality Synthesis-oriented

AI Analysis

This addresses the problem of poorly defined epics causing churn and delays for product managers in agile development, but it is an incremental application of existing AI methods.

The study investigated using large language models (LLMs) to evaluate the quality of agile epics in software development, finding high satisfaction among 17 product managers but also identifying challenges and adoption barriers.

The broad availability of generative AI offers new opportunities to support various work domains, including agile software development. Agile epics are a key artifact for product managers to communicate requirements to stakeholders. However, in practice, they are often poorly defined, leading to churn, delivery delays, and cost overruns. In this industry case study, we investigate opportunities for large language models (LLMs) to evaluate agile epic quality in a global company. Results from a user study with 17 product managers indicate how LLM evaluations could be integrated into their work practices, including perceived values and usage in improving their epics. High levels of satisfaction indicate that agile epics are a new, viable application of AI evaluations. However, our findings also outline challenges, limitations, and adoption barriers that can inform both practitioners and researchers on the integration of such evaluations into future agile work practices.

View on arXiv PDF

Similar