Annotation-efficient, patch-based, explainable deep learning using curriculum method for breast cancer detection in screening mammography

Camurdan, Ozden; Tanyel, Toygar; Aktufan Cerekci, Esma; ALİS, DENİZ; Meltem, Emine; Denizoglu, Nurper; Seker, Mustafa; Öksüz, İlkay; KARAARSLAN, Ercan

doi:10.1186/s13244-025-01922-w

Annotation-efficient, patch-based, explainable deep learning using curriculum method for breast cancer detection in screening mammography

Camurdan O., Tanyel T., Aktufan Cerekci E., ALİS D. C., Meltem E., Denizoglu N., ...Daha Fazla

Insights into Imaging, cilt.16, sa.1, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 16 Sayı: 1
Basım Tarihi: 2025
Doi Numarası: 10.1186/s13244-025-01922-w
Dergi Adı: Insights into Imaging
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, EMBASE, Directory of Open Access Journals
Anahtar Kelimeler: Breast cancer detection, Curriculum learning, Deep learning, Explainable artificial intelligence (XAI), Mammography
Acıbadem Mehmet Ali Aydınlar Üniversitesi Adresli: Evet

Özet

Objectives: To develop an efficient deep learning (DL) model for breast cancer detection in mammograms, utilizing both weak (image-level) and strong (bounding boxes) annotations and providing explainable artificial intelligence (XAI) with gradient-weighted class activation mapping (Grad-CAM), assessed by the ground truth overlap ratio. Methods: Three radiologists annotated a balanced dataset of 1976 mammograms (cancer-positive and -negative) from three centers. We developed a patch-based DL model using curriculum learning, progressively increasing patch sizes during training. The model was trained under varying levels of strong supervision (0%, 20%, 40%, and 100% of the dataset), resulting in baseline, curriculum 20, curriculum 40, and curriculum 100 models. Training for each model was repeated ten times, with results presented as mean ± standard deviation. Model performance was also tested on an external dataset of 4276 mammograms to assess generalizability. Results: F1 scores for the baseline, curriculum 20, curriculum 40, and curriculum 100 models were 80.55 ± 0.88, 82.41 ± 0.47, 83.03 ± 0.31, and 83.95 ± 0.55, respectively, with ground truth overlap ratios of 60.26 ± 1.91, 62.13 ± 1.2, 62.26 ± 1.52, and 64.18 ± 1.37. In the external dataset, F1 scores were 74.65 ± 1.35, 77.77 ± 0.73, 78.23 ± 1.78, and 78.73 ± 1.25, respectively, maintaining a similar performance trend. Conclusion: Training DL models with a curriculum method and a patch-based approach yields satisfactory performance and XAI, even with a limited set of densely annotated data, offering a promising avenue for deploying DL in large-scale mammography datasets. Critical relevance: This study introduces a DL model for mammography-based breast cancer detection, utilizing curriculum learning with limited, strongly labeled data. It showcases performance gains and better explainability, addressing challenges of extensive dataset needs and DL’s “black-box” nature. Key Points: Increasing numbers of mammograms for radiologists to interpret pose a logistical challenge. We trained a DL model leveraging curriculum learning with mixed annotations for mammography. The DL model outperformed the baseline model with image-level annotations using only 20% of the strong labels. The study addresses the challenge of requiring extensive datasets and strong supervision for DL efficacy. The model demonstrated improved explainability through Grad-CAM, verified by a higher ground truth overlap ratio. He proposed approach also yielded robust performance on external testing data.