SEFeb 2, 2022

A Versatile Dataset of Agile Open Source Software Projects

arXiv:2202.00979v132 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This provides a comprehensive resource for researchers studying Agile software development, though it is incremental as it builds on existing data collection efforts.

The authors tackled the lack of holistic datasets for Agile open-source software projects by creating a versatile dataset with over 500,000 issues from 44 projects, making it publicly available for research in areas like effort estimation and issue prioritization.

Agile software development is nowadays a widely adopted practise in both open-source and industrial software projects. Agile teams typically heavily rely on issue management tools to document new issues and keep track of outstanding ones, in addition to storing their technical details, effort estimates, assignment to developers, and more. Previous work utilised the historical information stored in issue management systems for various purposes; however, when researchers make their empirical data public, it is usually relevant solely to the study's objective. In this paper, we present a more holistic and versatile dataset containing a wealth of information on more than 500,000 issues from 44 open-source Agile software, making it well-suited to several research avenues, and cross-analyses therein, including effort estimation, issue prioritization, issue assignment and many more. We make this data publicly available on GitHub to facilitate ease of use, maintenance, and extensibility.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes