HIBIT2021, Ankara, Turkey, 10 - 11 September 2021, pp.53-54, (Summary Text)
Application of machine learning algorithm for the accurate
diagnosis of breast cancerRumeysa Fayet ̈orbay and U ̆gur SezermanDepartment of Biostatistics and Bioinformatics, Graduate School of Health Sciences,
Acıbadem MAA University, ̇Istanbul, TurkeyWith 2.26 million new cancer cases in 2020, breast cancer is the most common
cancer type [1]. Globally, it is the most frequently observed malignancy for females,
corresponding to the approximately 1 in 4 cases among women [2]. Tumorigenic
samples were taken from breast mass by applying the fine needle aspiration (FNA)
biopsy technique [3]. 10 distinct features, relevant to the diagnostic accuracy, which
are evaluated from FNA’s digitized image, are area, compactness, concave points,
concavity, fractal dimension, perimeter, radius, smoothness, symmetry and texture
[3]. These features refer to the characteristics of the cell nuclei illustrated in the
digital images [3]. Each feature has 3 subcategorical information about the ‘mean’,
‘standard error’ and ‘worst’ of the images [3]. In total, there are 30 different features calculated in the dataset which is also available at the UCI’s machine learning
repository.
Machine learning algorithms have been performed for feature extraction, classification and clustering approaches. To prevent the overfitting in the logistic regression
models, the least absolute shrinkage and selection operator (LASSO), penalized regression method, which is mainly used for variable selection and regularization, is
an alternative to apply. Lasso regression performs L1 regularization which basically
parameterizes the shrinkage of estimates and penalizes certain regression coefficients
with zero weight unless they are significant in order to enhance the prediction accuracy [4].
In this study, our main purpose is to reveal the essential characteristics of the
breast mass which directly indicate the tumorigenicity (benign/malignant) of the
breast cells. For this purpose, we partitioned our data into the training and the test
sets. To further implement, we assigned binary values for determining the level of the
oncogenicity. After making predictions on the test data, we checked the performance
of our model with lasso regression, our model was predicted the response variable
with 97.2% accuracy. Out of 30 characteristic features, there were 13 significant
coefficients for variable selection which were concave points mean, symmetry mean,
compactness standard error, fractal dimension standard error, radius standard error,
smoothness standard error, texture standard error, concave points worst, concavity
worst, radius worst, smoothness worst, symmetry worst and texture worst. Since
other 17 estimated coefficients were not significant, they were shrunk to zero by our
model. The sensitivity and the specificity of our model were approximately 96%
and 98%, respectively. To evaluate the performance of our lasso regression model,
ROC curve plot was drawn; we were able to distinguish between the groups and
obtained a clear separation which was close to the ideal. The results suggested the
application of such machine learning algorithms had a high potential to determine
the features that are relevant to the diagnostic accuracy of the breast cancer.
References1. Sung, Hyuna et al. “Global Cancer Statistics 2020: GLOBOCAN Estimates
of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries.” CA:
a cancer journal for clinicians vol. 71,3 (2021): 209-249.
2. Momozawa, Yukihide et al. “Germline pathogenic variants of 11 breast cancer
genes in 7,051 Japanese patients and 11,241 controls.” Nature communications
vol. 9,1 4083. 4 Oct. 2018.
3. Wolberg, W H, and O L Mangasarian. “Multisurface method of pattern separation for medical diagnosis applied to breast cytology.” Proceedings of the
National Academy of Sciences of the United States of America vol. 87,23
(1990): 9193-6. doi:10.1073/pnas.87.23.9193.
4. Veen, Kevin M et al. “A clinician’s guide for developing a prediction model: a
case study using real-world data of patients with castration-resistant prostate
cancer.” Journal of cancer research and clinical oncology vol. 146,8 (2020):
2067-2075.