SEJan 20, 2022

An Alternative Issue Tracking Dataset of Public Jira Repositories

arXiv:2201.08368v325 citations
AI Analysis

This provides a new dataset for researchers studying issue tracking systems, addressing a gap in available data for Jira, which is widely used in practice but understudied compared to other systems like GitHub.

The authors tackled the lack of diverse public Jira datasets by releasing a dataset of 16 public Jira repositories with 1822 projects, spanning 2.7 million issues, 32 million changes, 9 million comments, and 1 million issue links, aiming to enable research on issue evolution and cross-tool analysis.

Organisations use issue tracking systems (ITSs) to track and document their projects' work in units called issues. This style of documentation encourages evolutionary refinement, as each issue can be independently improved, commented on, linked to other issues, and progressed through the organisational workflow. Commonly studied ITSs so far include GitHub, GitLab, and Bugzilla, while Jira, one of the most popular ITS in practice with a wealth of additional information, has yet to receive similar attention. Unfortunately, diverse public Jira datasets are rare, likely due to the difficulty in finding and accessing these repositories. With this paper, we release a dataset of 16 public Jiras with 1822 projects, spanning 2.7 million issues with a combined total of 32 million changes, 9 million comments, and 1 million issue links. We believe this Jira dataset will lead to many fruitful research projects investigating issue evolution, issue linking, cross-project analysis, as well as cross-tool analysis when combined with existing well-studied ITS datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes