Scientific Reports, cilt.16, sa.1, 2026 (SCI-Expanded, Scopus)
Predicting disease-associated peptides is a challenging task in bioinformatics, mostly hindered by the lack of reliable negative datasets, leading to biased predictions. In this study, we propose a one-class classification approach that focuses exclusively on positive-labeled data. We employed three classifiers namely One-Class Support Vector Machines (OCSVM), Isolation Forest, and Autoencoders to classify disease-associated peptides, with Autoencoders yielding the best results. The Autoencoders trained on the positive dataset effectively differentiated the inliers from outliers which is further evaluated by mean reconstruction errors. Our method combines various sequence based features together. This framework provides an efficient solution for predicting disease-associated peptides that also overcomes the traditional binary classification approaches. To enhance interpretability and peptide prioritization, we introduce a new scoring metric Disease Peptide Anomaly Score (DPAS) which combines model-derived anomaly scores with feature importance values obtained using SHAP (SHapley Additive exPlanations). DPAS facilitates the ranking of peptides based on their likelihood of being disease-associated, offering a robust and interpretable approach for peptide biomarker discovery.