Optimal utilization of historical data sets for the construction of software cost prediction models

Liu, Qin (2006) Optimal utilization of historical data sets for the construction of software cost prediction models. Doctoral thesis, Northumbria University.

PDF (PhD thesis)

Download (18MB) | Preview


The accurate prediction of software development cost at early stage of development life-cycle may have a vital economic impact and provide fundamental information for management decision making. However, it is not well understood in practice how to optimally utilize historical software project data for the construction of cost predictions. This is because the analysis of historical data sets for software cost estimation leads to many practical difficulties. In addition, there has been little research done to prove the benefits. To overcome these limitations, this research proposes a preliminary data analysis framework, which is an extension of Maxwell's study. The proposed framework is based on a set of statistical analysis methods such as correlation analysis, stepwise ANOVA, univariate analysis, etc. and provides a formal basis for the erection of cost prediction models from his¬torical data sets. The proposed framework is empirically evaluated against commonly used prediction methods, namely Ordinary Least-Square Regression (OLS), Robust Regression (RR), Classification and Regression Trees (CART), K-Nearest Neighbour (KNN), and is also applied to both heterogeneous and homogeneous data sets. Formal statistical significance testing was performed for the comparisons. The results from the comparative evaluation suggest that the proposed preliminary data analysis framework is capable to construct more accurate prediction models for all selected prediction techniques. The framework processed predictor variables are statistic significant, at 95% confidence level for both parametric techniques (OLS and RR) and one non-parametric technique (CART). Both the heterogeneous data set and homogenous data set benefit from the application of the proposed framework for improving project effort prediction accuracy. The homogeneous data set is more effective after being processed by the framework. Overall, the evaluation results demonstrate that the proposed framework has an excellent applicability. Further research could focus on two main purposes: First, improve the applicability by integrating missing data techniques such as listwise deletion (LD), mean imputation (MI), etc., for handling missing values in historical data sets. Second, apply benchmarking to enable comparisons, i.e. allowing companies to compare themselves with respect to their productivity or quality.

Item Type: Thesis (Doctoral)
Subjects: N100 Business studies
Department: University Services > Graduate School > Doctor of Philosophy
Faculties > Business and Law > Newcastle Business School
Related URLs:
Depositing User: EPrint Services
Date Deposited: 06 Apr 2010 11:03
Last Modified: 17 Dec 2023 15:46
URI: https://nrl.northumbria.ac.uk/id/eprint/2129

Actions (login required)

View Item View Item


Downloads per month over past year

View more statistics