Lung Nodule Malignancy Prediction in Sequential CT Scans: Summary of ISBI 2018 Challenge

oganand Balagurunathan, Andrew Beers, Michael Mcnitt-Gray, Lubomir Hadjiiski, Sandy Napel, Dmitry Goldgof, Gustavo Perez, Pablo Arbelaez, Alireza Mehrtash, Tina Kapur, Ehwa Yang, Jung Won Moon, Gabriel Bernardino Perez, Ricard Delgado-Gonzalo, M Mehdi Farhangi, Amir A Amini, Renkun Ni, Xue Feng, Aditya Bagari, Kiran Vaidhya, Benjamin Veasey, Wiem Safta, Hichem Frigui, Joseph Enguehard, Ali Gholipour, Laura Silvana Castillo, Laura Alexandra Daza, Paul Pinsky, Jayashree Kalpathy-Cramer, Keyvan Farahani

IEEE Transactions on Medical Imaging

Abstract

Lung cancer is by far the leading cause of cancer death in the US. Recent studies have demonstrated the effectiveness of screening using low dose CT (LDCT) in reducing lung cancer related mortality. While lung nodules are detected with a high rate of sensitivity, this exam has a low specificity rate and it is still difficult to separate benign and malignant lesions. The ISBI 2018 Lung Nodule Malignancy Prediction Challenge, developed by a team from the Quantitative Imaging Network of the National Cancer Institute, was focused on the prediction of lung nodule malignancy from two sequential LDCT screening exams using automated (non-manual) algorithms. We curated a cohort of 100 subjects who participated in the National Lung Screening Trial and had established pathological diagnoses. Data from 30 subjects were randomly selected for training and the remaining was used for testing. Participants were evaluated based on the area under the receiver operating characteristic curve (AUC) of nodule-wise malignancy scores generated by their algorithms on the test set. The challenge had 17 participants, with 11 teams submitting reports with method description, mandated by the challenge rules. Participants used quantitative methods, resulting in a reporting test AUC ranging from 0.698 to 0.913. The top five contestants used deep learning approaches, reporting an AUC between 0.87 – 0.91. The team’s predictor did not achieve significant differences from each other nor from a volume change estimate (p =.05 with Bonferroni-Holm’s correction).

Journal