The VAMPIRE challenge: A multi-institutional validation study of CT ventilation imaging
CT ventilation imaging (CTVI) is being used to achieve functional avoidance lung cancer radiation therapy in three clinical trials (NCT02528942, NCT02308709, NCT02843568). To address the need for common CTVI validation tools, we have built the Ventilation And Medical Pulmonary Image Registration Evaluation (VAMPIRE) Dataset, and present the results of the first VAMPIRE Challenge to compare relative ventilation distributions between different CTVI algorithms and other established ventilation imaging modalities.
The VAMPIRE Dataset includes 50 pairs of 4DCT scans and corresponding clinical or experimental ventilation scans, referred to as reference ventilation images (RefVIs). The dataset includes 25 humans imaged with Galligas 4DPET/CT, 21 humans imaged with DTPA‐SPECT, and 4 sheep imaged with Xenon‐CT. For the VAMPIRE Challenge, 16 subjects were allocated to a training group (with RefVI provided) and 34 subjects were allocated to a validation group (with RefVI blinded). Seven research groups downloaded the Challenge dataset and uploaded CTVIs based on deformable image registration (DIR) between the 4DCT inhale/exhale phases. Participants used DIR methods broadly classified into B‐splines, Free‐form, Diffeomorphisms, or Biomechanical modeling, with CT ventilation metrics based on the DIR evaluation of volume change, Hounsfield Unit change, or various hybrid approaches. All CTVIs were evaluated against the corresponding RefVI using the voxel‐wise Spearman coefficient rs and Dice similarity coefficients evaluated for low function lung (DSClow) and high function lung (DSChigh).
A total of 37 unique combinations of DIR method and CT ventilation metric were either submitted by participants directly or derived from participant‐submitted DIR motion fields using the in‐house software, VESPIR. The rs and DSC results reveal a high degree of inter‐algorithm and intersubject variability among the validation subjects, with algorithm rankings changing by up to ten positions depending on the choice of evaluation metric. The algorithm with the highest overall cross‐modality correlations used a biomechanical model‐based DIR with a hybrid ventilation metric, achieving a median (range) of 0.49 (0.27–0.73) for , 0.52 (0.36–0.67) for DSClow, and 0.45 (0.28–0.62) for DSChigh. All other algorithms exhibited at least one negative rs value, and/or one DSC value less than 0.5.
The VAMPIRE Challenge results demonstrate that the cross‐modality correlation between CTVIs and the RefVIs varies not only with the choice of CTVI algorithm but also with the choice of RefVI modality, imaging subject, and the evaluation metric used to compare relative ventilation distributions. This variability may arise from the fact that each of the different CTVI algorithms and RefVI modalities provides a distinct physiologic measurement. Ultimately this variability, coupled with the lack of a “gold standard,” highlights the ongoing importance of further validation studies before CTVI can be widely translated from academic centers to the clinic. It is hoped that the information gleaned from the VAMPIRE Challenge can help inform future validation efforts.