Faculty of Engineering and Information Sciences - Papers: Part B

The effect of imputing missing clinical attribute values on training lung cancer survival prediction model performance

Mohamed S. Barakat, University of New South WalesFollow
Matthew Field, Liverpool and Macarthur Cancer Therapy CentresFollow
Aditya K. Ghose, University of WollongongFollow
David Stirling, University of WollongongFollow
Lois C. Holloway, University of WollongongFollow
Shalini K. Vinod, Liverpool Hospital, Ingham Institute For Applied Medical Research, University of Sydney, University of New South WalesFollow
Andre Dekker, Liverpool and Macarthur Cancer Therapy Centres, Maastricht UniversityFollow
David Thwaites, University of Sydney, University of SydneyFollow

RIS ID

127838

Publication Details

M. S. Barakat, M. Field, A. Ghose, D. Stirling, L. Holloway, S. Vinod, A. Dekker & D. Thwaites, "The effect of imputing missing clinical attribute values on training lung cancer survival prediction model performance," Health Information Science and Systems, vol. 5, pp. 16-1-16-11, 2017.

Abstract

According to the estimations of the World Health Organization and the International Agency for Research in Cancer, lung cancer is the most common cause of death from cancer worldwide. The last few years have witnessed a rise in the attention given to the use of clinical decision support systems in medicine generally and in cancer in particular. These can predict patients' likelihood of survival based on analysis of and learning from previously treated patients. The datasets that are mined for developing clinical decision support functionality are often incomplete, which adversely impacts the quality of the models developed and the decision support offered. Imputing missing data using a statistical analysis approach is a common method to addressing the missing data problem. This work investigates the effect of imputation methods for missing data in preparing a training dataset for a Non-Small Cell Lung Cancer survival prediction model using several machine learning algorithms. The investigation includes an assessment of the effect of imputation algorithm error on performance prediction and also a comparison between using a smaller complete real dataset or a larger dataset with imputed data. Our results show that even when the proportion of records with some missing data is very high (> 80%) imputation can lead to prediction models with an AUC (0.68-0.72) comparable to those trained with complete data records.

Download

Included in

Engineering Commons, Science and Technology Studies Commons

COinS

Link to publisher version (DOI)

http://dx.doi.org/10.1007/s13755-017-0039-4

Faculty of Engineering and Information Sciences - Papers: Part B

The effect of imputing missing clinical attribute values on training lung cancer survival prediction model performance

RIS ID

Publication Details

Abstract

Included in

Link to publisher version (DOI)

Search

Browse

Links

Faculty of Engineering and Information Sciences - Papers: Part B

The effect of imputing missing clinical attribute values on training lung cancer survival prediction model performance

Authors

RIS ID

Publication Details

Abstract

Included in

Share

Link to publisher version (DOI)

Search

Browse

Links