SEJan 17, 2014

Lessons Learned and Results from Applying Data-Driven Cost Estimation to Industrial Data Sets

Jens Heidrich, Adam Trendowicz, Jürgen Münch, Yasushi Ishigai, Kenji Yokoyama, Nahomi Kikuchi, T. Kawaguchi

arXiv:1401.4256v1

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of applying data-driven cost estimation in industry with imperfect data, but it is incremental as it builds on existing methods and focuses on practical case study insights.

The study applied the Optimized Set Reduction (OSR(c)) method to industrial cost estimation data at Toshiba, finding that estimation accuracy varied significantly based on data sets and preprocessing techniques.

The increasing availability of cost-relevant data in industry allows companies to apply data-intensive estimation methods. However, available data are often inconsistent, invalid, or incomplete, so that most of the existing data-intensive estimation methods cannot be applied. Only few estimation methods can deal with imperfect data to a certain extent (e.g., Optimized Set Reduction, OSR(c)). Results from evaluating these methods in practical environments are rare. This article describes a case study on the application of OSR(c) at Toshiba Information Systems (Japan) Corporation. An important result of the case study is that estimation accuracy significantly varies with the data sets used and the way of preprocessing these data. The study supports current results in the area of quantitative cost estimation and clearly illustrates typical problems. Experiences, lessons learned, and recommendations with respect to data preprocessing and data-intensive cost estimation in general are presented.

View on arXiv PDF

Similar