AI SEJul 19, 2025

Efficient Story Point Estimation With Comparative Learning

Monoshiz Mahbub Khan, Xiaoyin Xi, Andrew Meneely, Zhe Yu

arXiv:2507.14642v27.84 citationsh-index: 2

Originality Incremental advance

AI Analysis

This work addresses the efficiency of story point estimation for software developers by reducing cognitive burden, though it is incremental as it builds on existing machine learning methods with a novel calibration approach.

The paper tackled the problem of tedious story point estimation in agile software development by proposing a comparative learning framework that uses pairwise comparisons to predict story points, achieving a Spearman's rank correlation coefficient of 0.34 on average across 16 projects, which is similar to or better than regression models trained on ground truth data.

Story point estimation is an essential part of agile software development. Story points are unitless, project-specific effort estimates that help developers plan their sprints. Traditionally, developers estimate story points collaboratively using planning poker or other manual techniques. While the initial calibrating of the estimates to each project is helpful, once a team has converged on a set of precedents, story point estimation can become tedious and labor-intensive. Machine learning can reduce this burden, but only with enough context from the historical decisions made by the project team. That is, state-of-the-art models, such as GPT2SP and FastText-SVM, only make accurate predictions (within-project) when trained on data from the same project. The goal of this work is to streamline story point estimation by evaluating a comparative learning-based framework for calibrating project-specific story point prediction models. Instead of assigning a specific story point value to every backlog item, developers are presented with pairs of items, and indicate which item requires more effort. Using these comparative judgments, a machine learning model is trained to predict the story point estimates. We empirically evaluated our technique using data with 23,313 manual estimates in 16 projects. The model learned from comparative judgments can achieve on average 0.34 Spearman's rank correlation coefficient between its predictions and the ground truth story points. This is similar to, if not better than, the performance of a regression model learned from the ground truth story points. Therefore, the proposed comparative learning approach is more efficient than state-of-the-art regression-based approaches according to the law of comparative judgments - providing comparative judgments yields a lower cognitive burden on humans than providing ratings or categorical labels.

View on arXiv PDF

Similar