Pitfalls in prediction modeling for normal tissue toxicity in radiation therapy: an illustration with the individual radiation sensitivity and mammary carcinoma risk factor investigation cohorts
Purpose To identify the main causes underlying the failure of prediction models for radiation therapy toxicity to replicate. Methods and Materials Data were used from two German cohorts, Individual Radiation Sensitivity (ISE) (n=418) and Mammary Carcinoma Risk Factor Investigation (MARIE) (n=409), of breast cancer patients with similar characteristics and radiation therapy treatments. The toxicity endpoint chosen was telangiectasia. The LASSO (least absolute shrinkage and selection operator) logistic regression method was used to build a predictive model for a dichotomized endpoint (Radiation Therapy Oncology Group/European Organization for the Research and Treatment of Cancer score 0, 1, or ¿2). Internal areas under the receiver operating characteristic curve (inAUCs) were calculated by a naïve approach whereby the training data (ISE) were also used for calculating the AUC. Cross-validation was also applied to calculate the AUC within the same cohort, a second type of inAUC. Internal AUCs from cross-validation were calculated within ISE and MARIE separately. Models trained on one dataset (ISE) were applied to a test dataset (MARIE) and AUCs calculated (exAUCs). Results Internal AUCs from the naïve approach were generally larger than inAUCs from cross-validation owing to overfitting the training data. Internal AUCs from cross-validation were also generally larger than the exAUCs, reflecting heterogeneity in the predictors between cohorts. The best models with largest inAUCs from cross-validation within both cohorts had a number of common predictors: hypertension, normalized total boost, and presence of estrogen receptors. Surprisingly, the effect (coefficient in the prediction model) of hypertension on telangiectasia incidence was positive in ISE and negative in MARIE. Other predictors were also not common between the 2 cohorts, illustrating that overcoming overfitting does not solve the problem of replication failure of prediction models completely. Conclusions Overfitting and cohort heterogeneity are the 2 main causes of replication failure of prediction models across cohorts. Cross-validation and similar techniques (eg, bootstrapping) cope with overfitting, but the development of validated predictive models for radiation therapy toxicity requires strategies that deal with cohort heterogeneity.