TY - JOUR
T1 - Comparison of semi-automatic and deep learning-based automatic methods for liver segmentation in living liver transplant donors
AU - Kavur, A. Emre
AU - Gezer, Naciye Sinem
AU - Barış, Mustafa
AU - Şahin, Yusuf
AU - Özkan, Savaş
AU - Baydar, Bora
AU - Yüksel, Ulaş
AU - Kılıkçıer, Çağlar
AU - Olut, Şahin
AU - Akar, Gözde Bozdağı
AU - Ünal, Gözde
AU - Dicle, Oğuz
AU - Selver, M. Alper
N1 - Publisher Copyright:
© Turkish Society of Radiology 2020.
PY - 2020
Y1 - 2020
N2 - PURPOSE We aimed to compare the accuracy and repeatability of emerging machine learning-based (i.e., deep learning) automatic segmentation algorithms with those of well-established interactive semi-automatic methods for determining liver volume in living liver transplant donors at computed tomography (CT) imaging. METHODS A total of 12 methods (6 semi-automatic, 6 full-automatic) were evaluated. The semi-automatic segmentation algorithms were based on both traditional iterative models including watershed, fast marching, region growing, active contours and modern techniques including robust statistics segmenter and super-pixels. These methods entailed some sort of interaction mechanism such as placing initialization seeds on images or determining a parameter range. The automatic methods were based on deep learning and included three framework templates (DeepMedic, NiftyNet and U-Net), the first two of which were applied with default parameter sets and the last two involved adapted novel model designs. For 20 living donors (8 training and 12 test data-sets), a group of imaging scientists and radiologists created ground truths by performing manual segmentations on contrast-enhanced CT images. Each segmentation was evaluated using five metrics (i.e., volume overlap and relative volume errors, average/root-mean-square/maximum symmetrical surface distances). The results were mapped to a scoring system and a final grade was calculated by taking their average. Accuracy and repeatability were evaluated using slice-by-slice comparisons and volumetric analysis. Diversity and complementarity were observed through heatmaps. Majority voting (MV) and simultaneous truth and performance level estimation (STAPLE) algorithms were utilized to obtain the fusion of the individual results. RESULTS The top four methods were automatic deep learning models, with scores of 79.63, 79.46, 77.15, and 74.50. Intra-user score was determined as 95.14. Overall, automatic deep learning segmentation outperformed interactive techniques on all metrics. The mean volume of liver of ground truth was 1409.93±271.28 mL, while it was calculated as 1342.21±231.24 mL using automatic and 1201.26±258.13 mL using interactive methods, showing higher accuracy and less variation with automatic methods. The qualitative analysis of segmentation results showed significant diversity and complementarity, enabling the idea of using ensembles to obtain superior results. The fusion score of automatic methods reached 83.87 with MV and 86.20 with STAPLE, which were only slightly less than fusion of all methods (MV, 86.70) and (STAPLE, 88.74). CONCLUSION Use of the new deep learning-based automatic segmentation algorithms substantially increases the accuracy and repeatability for segmentation and volumetric measurements of liver. Fusion of automatic methods based on ensemble approaches exhibits best results with almost no additional time cost due to potential parallel execution of multiple models.
AB - PURPOSE We aimed to compare the accuracy and repeatability of emerging machine learning-based (i.e., deep learning) automatic segmentation algorithms with those of well-established interactive semi-automatic methods for determining liver volume in living liver transplant donors at computed tomography (CT) imaging. METHODS A total of 12 methods (6 semi-automatic, 6 full-automatic) were evaluated. The semi-automatic segmentation algorithms were based on both traditional iterative models including watershed, fast marching, region growing, active contours and modern techniques including robust statistics segmenter and super-pixels. These methods entailed some sort of interaction mechanism such as placing initialization seeds on images or determining a parameter range. The automatic methods were based on deep learning and included three framework templates (DeepMedic, NiftyNet and U-Net), the first two of which were applied with default parameter sets and the last two involved adapted novel model designs. For 20 living donors (8 training and 12 test data-sets), a group of imaging scientists and radiologists created ground truths by performing manual segmentations on contrast-enhanced CT images. Each segmentation was evaluated using five metrics (i.e., volume overlap and relative volume errors, average/root-mean-square/maximum symmetrical surface distances). The results were mapped to a scoring system and a final grade was calculated by taking their average. Accuracy and repeatability were evaluated using slice-by-slice comparisons and volumetric analysis. Diversity and complementarity were observed through heatmaps. Majority voting (MV) and simultaneous truth and performance level estimation (STAPLE) algorithms were utilized to obtain the fusion of the individual results. RESULTS The top four methods were automatic deep learning models, with scores of 79.63, 79.46, 77.15, and 74.50. Intra-user score was determined as 95.14. Overall, automatic deep learning segmentation outperformed interactive techniques on all metrics. The mean volume of liver of ground truth was 1409.93±271.28 mL, while it was calculated as 1342.21±231.24 mL using automatic and 1201.26±258.13 mL using interactive methods, showing higher accuracy and less variation with automatic methods. The qualitative analysis of segmentation results showed significant diversity and complementarity, enabling the idea of using ensembles to obtain superior results. The fusion score of automatic methods reached 83.87 with MV and 86.20 with STAPLE, which were only slightly less than fusion of all methods (MV, 86.70) and (STAPLE, 88.74). CONCLUSION Use of the new deep learning-based automatic segmentation algorithms substantially increases the accuracy and repeatability for segmentation and volumetric measurements of liver. Fusion of automatic methods based on ensemble approaches exhibits best results with almost no additional time cost due to potential parallel execution of multiple models.
UR - http://www.scopus.com/inward/record.url?scp=85077479935&partnerID=8YFLogxK
U2 - 10.5152/dir.2019.19025
DO - 10.5152/dir.2019.19025
M3 - Article
C2 - 31904568
AN - SCOPUS:85077479935
SN - 1305-3825
VL - 26
SP - 11
EP - 21
JO - Diagnostic and Interventional Radiology
JF - Diagnostic and Interventional Radiology
IS - 1
ER -