Predicting delays in software projects using networked classification
Software projects have a high risk of cost and schedule overruns, which has been a source of concern for the software engineering community for a long time. One of the challenges in software project management is to make reliable prediction of delays in the context of constant and rapid changes inherent in software projects. This paper presents a novel approach to providing automated support for project managers and other decision makers in predicting whether a subset of software tasks (among the hundreds to thousands of ongoing tasks) in a software project have a risk of being delayed. Our approach makes use of not only features specific to individual software tasks (i.e. local data) -- as done in previous work -- but also their relationships (i.e. networked data). In addition, using collective classification, our approach can simultaneously predict the degree of delay for a group of related tasks. Our evaluation results show a significant improvement over traditional approaches which perform classification on each task independently: achieving 46% -- 97% precision (49% improved), 46% -- 97% recall (28% improved), 56% -- 75% F-measure (39% improved), and 78% -- 95% Area Under the ROC Curve (16% improved).