A comparison of various feature extraction and machine learning methods for antimicrobial resistance prediction in streptococcus pneumoniae


Creative Commons License

Kaya D. E., Ülgen E., KOCAGÖZ A. S., SEZERMAN O. U.

Frontiers in Antibiotics, cilt.2, 2023 (Scopus) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 2
  • Basım Tarihi: 2023
  • Doi Numarası: 10.3389/frabi.2023.1126468
  • Dergi Adı: Frontiers in Antibiotics
  • Derginin Tarandığı İndeksler: Scopus
  • Anahtar Kelimeler: AMR, kmer, machine learning, SNP, streptococcus pneumonaie, whole genome sequencing (WGS)
  • Acıbadem Mehmet Ali Aydınlar Üniversitesi Adresli: Evet

Özet

Streptococcus pneumoniae is one of the major concerns of clinicians and one of the global public health problems. This pathogen is associated with high morbidity and mortality rates and antimicrobial resistance (AMR). In the last few years, reduced genome sequencing costs have made it possible to explore more of the drug resistance of S. pneumoniae, and machine learning (ML) has become a popular tool for understanding, diagnosing, treating, and predicting these phenotypes. Nucleotide k-mers, amino acid k-mers, single nucleotide polymorphisms (SNPs), and combinations of these features have rich genetic information in whole-genome sequencing. This study compares different ML models for predicting AMR phenotype for S. pneumoniae. We compared nucleotide k-mers, amino acid k-mers, SNPs, and their combinations to predict AMR in S. pneumoniae for three antibiotics: Penicillin, Erythromycin, and Tetracycline. 980 pneumococcal strains were downloaded from the European Nucleotide Archive (ENA). Furthermore, we used and compared several machine learning methods to train the models, including random forests, support vector machines, stochastic gradient boosting, and extreme gradient boosting. In this study, we found that key features of the AMR prediction model setup and the choice of machine learning method affected the results. The approach can be applied here to further studies to improve AMR prediction accuracy and efficiency.