TY - JOUR
T1 - Annotation-efficient, patch-based, explainable deep learning using curriculum method for breast cancer detection in screening mammography
AU - Camurdan, Ozden
AU - Tanyel, Toygar
AU - Aktufan Cerekci, Esma
AU - Alis, Deniz
AU - Meltem, Emine
AU - Denizoglu, Nurper
AU - Seker, Mustafa Ege
AU - Oksuz, Ilkay
AU - Karaarslan, Ercan
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/12
Y1 - 2025/12
N2 - Objectives: To develop an efficient deep learning (DL) model for breast cancer detection in mammograms, utilizing both weak (image-level) and strong (bounding boxes) annotations and providing explainable artificial intelligence (XAI) with gradient-weighted class activation mapping (Grad-CAM), assessed by the ground truth overlap ratio. Methods: Three radiologists annotated a balanced dataset of 1976 mammograms (cancer-positive and -negative) from three centers. We developed a patch-based DL model using curriculum learning, progressively increasing patch sizes during training. The model was trained under varying levels of strong supervision (0%, 20%, 40%, and 100% of the dataset), resulting in baseline, curriculum 20, curriculum 40, and curriculum 100 models. Training for each model was repeated ten times, with results presented as mean ± standard deviation. Model performance was also tested on an external dataset of 4276 mammograms to assess generalizability. Results: F1 scores for the baseline, curriculum 20, curriculum 40, and curriculum 100 models were 80.55 ± 0.88, 82.41 ± 0.47, 83.03 ± 0.31, and 83.95 ± 0.55, respectively, with ground truth overlap ratios of 60.26 ± 1.91, 62.13 ± 1.2, 62.26 ± 1.52, and 64.18 ± 1.37. In the external dataset, F1 scores were 74.65 ± 1.35, 77.77 ± 0.73, 78.23 ± 1.78, and 78.73 ± 1.25, respectively, maintaining a similar performance trend. Conclusion: Training DL models with a curriculum method and a patch-based approach yields satisfactory performance and XAI, even with a limited set of densely annotated data, offering a promising avenue for deploying DL in large-scale mammography datasets. Critical relevance: This study introduces a DL model for mammography-based breast cancer detection, utilizing curriculum learning with limited, strongly labeled data. It showcases performance gains and better explainability, addressing challenges of extensive dataset needs and DL’s “black-box” nature. Key Points: Increasing numbers of mammograms for radiologists to interpret pose a logistical challenge. We trained a DL model leveraging curriculum learning with mixed annotations for mammography. The DL model outperformed the baseline model with image-level annotations using only 20% of the strong labels. The study addresses the challenge of requiring extensive datasets and strong supervision for DL efficacy. The model demonstrated improved explainability through Grad-CAM, verified by a higher ground truth overlap ratio. He proposed approach also yielded robust performance on external testing data.
AB - Objectives: To develop an efficient deep learning (DL) model for breast cancer detection in mammograms, utilizing both weak (image-level) and strong (bounding boxes) annotations and providing explainable artificial intelligence (XAI) with gradient-weighted class activation mapping (Grad-CAM), assessed by the ground truth overlap ratio. Methods: Three radiologists annotated a balanced dataset of 1976 mammograms (cancer-positive and -negative) from three centers. We developed a patch-based DL model using curriculum learning, progressively increasing patch sizes during training. The model was trained under varying levels of strong supervision (0%, 20%, 40%, and 100% of the dataset), resulting in baseline, curriculum 20, curriculum 40, and curriculum 100 models. Training for each model was repeated ten times, with results presented as mean ± standard deviation. Model performance was also tested on an external dataset of 4276 mammograms to assess generalizability. Results: F1 scores for the baseline, curriculum 20, curriculum 40, and curriculum 100 models were 80.55 ± 0.88, 82.41 ± 0.47, 83.03 ± 0.31, and 83.95 ± 0.55, respectively, with ground truth overlap ratios of 60.26 ± 1.91, 62.13 ± 1.2, 62.26 ± 1.52, and 64.18 ± 1.37. In the external dataset, F1 scores were 74.65 ± 1.35, 77.77 ± 0.73, 78.23 ± 1.78, and 78.73 ± 1.25, respectively, maintaining a similar performance trend. Conclusion: Training DL models with a curriculum method and a patch-based approach yields satisfactory performance and XAI, even with a limited set of densely annotated data, offering a promising avenue for deploying DL in large-scale mammography datasets. Critical relevance: This study introduces a DL model for mammography-based breast cancer detection, utilizing curriculum learning with limited, strongly labeled data. It showcases performance gains and better explainability, addressing challenges of extensive dataset needs and DL’s “black-box” nature. Key Points: Increasing numbers of mammograms for radiologists to interpret pose a logistical challenge. We trained a DL model leveraging curriculum learning with mixed annotations for mammography. The DL model outperformed the baseline model with image-level annotations using only 20% of the strong labels. The study addresses the challenge of requiring extensive datasets and strong supervision for DL efficacy. The model demonstrated improved explainability through Grad-CAM, verified by a higher ground truth overlap ratio. He proposed approach also yielded robust performance on external testing data.
KW - Breast cancer detection
KW - Curriculum learning
KW - Deep learning
KW - Explainable artificial intelligence (XAI)
KW - Mammography
UR - http://www.scopus.com/inward/record.url?scp=105000302643&partnerID=8YFLogxK
U2 - 10.1186/s13244-025-01922-w
DO - 10.1186/s13244-025-01922-w
M3 - Article
AN - SCOPUS:105000302643
SN - 1869-4101
VL - 16
JO - Insights into Imaging
JF - Insights into Imaging
IS - 1
M1 - 60
ER -