RDDSVM: accurate prediction of A-to-I RNA editing sites from sequence using support vector machines

Tac, Huseyin; Koroglu, Mustafa; SEZERMAN, Osman

doi:10.1007/s10142-021-00805-9

RDDSVM: accurate prediction of A-to-I RNA editing sites from sequence using support vector machines

Tac H. A., Koroglu M., SEZERMAN O. U.

FUNCTIONAL & INTEGRATIVE GENOMICS, cilt.21, sa.5-6, ss.633-643, 2021 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 21 Sayı: 5-6
Basım Tarihi: 2021
Doi Numarası: 10.1007/s10142-021-00805-9
Dergi Adı: FUNCTIONAL & INTEGRATIVE GENOMICS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, BIOSIS, CAB Abstracts, Chemical Abstracts Core, EMBASE, MEDLINE, Veterinary Science Database
Sayfa Sayıları: ss.633-643
Anahtar Kelimeler: RNA editing, RNA-seq, Machine learning, Support vector machines, ADENOSINE
Acıbadem Mehmet Ali Aydınlar Üniversitesi Adresli: Evet

Özet

Adenosine to inosine (A-to-I) editing in RNA is involved in various biological processes like gene expression, alternative splicing, and mRNA degradation associated with carcinogenesis and various human diseases. Therefore, accurate identification of RNA editing sites in transcriptome is valuable for research and medicine. RNA-seq is very useful for the detection of RNA editing events in condition-specific cells. However, computational analysis methods of RNA-seq data have considerable false-positive risks due to mapping errors. In this study, we developed a simple machine learning method using support vector machines to train sequence and structure information derived from flanking sequences of experimentally verified A-to-I editing sites to predict new A-to-I editing sites in RNA. The highest performance results were obtained by the model that utilizes the composition of the triplet sequence elements in the flanking regions of the in A-to-I editing sites. Using this model, the SVM classifier also showed high performance on experimentally verified data providing a sensitivity of 92.8%, specificity of 77.1%, and accuracy of 90.2%. To compare the predictive capacity of our method with other classifiers that use sequence information, we have used validated human A-to-I RNA editing sites by Sanger sequencing. Out of 58 validated editing sites, our method recognized 53 of them correctly with an accuracy of 91.4% outperforming other classifiers. As to our knowledge, this is the first case of utilization of the composition of the triplet sequence elements neighboring A-to-I editing sites for the prediction of new A-to-I editing sites in RNA. The methodology is very easy to perform and computationally low demanding making it a convenient and valuable choice for facilities with low sources. To facilitate the usage of the method publicly, we developed an open-source program called RDDSVM to perform prediction on candidate A-to-I RNA editing sites using support vector machines.