Prediction of Linear Cationic Antimicrobial Peptides Active against Gram-Negative and Gram-Positive Bacteria Based on Machine Learning Models


Creative Commons License

Soylemez U. G. , Yousef M., KESMEN Z., Buyukkiraz M. E. , Bakir-Gungor B.

APPLIED SCIENCES-BASEL, vol.12, no.7, 2022 (Journal Indexed in SCI) identifier identifier

  • Publication Type: Article / Article
  • Volume: 12 Issue: 7
  • Publication Date: 2022
  • Doi Number: 10.3390/app12073631
  • Title of Journal : APPLIED SCIENCES-BASEL
  • Keywords: antimicrobial peptide (AMP), machine learning, classification model, antimicrobial peptide prediction, antimicrobial activity, physico-chemical properties, linear cationic antimicrobial peptides, DESIGN, DISCOVERY, PROTEIN, CLASSIFIER, ALGORITHMS, DEFENSIN, SYSTEMS, TOOL

Abstract

Antimicrobial peptides (AMPs) are considered as promising alternatives to conventional antibiotics in order to overcome the growing problems of antibiotic resistance. Computational prediction approaches receive an increasing interest to identify and design the best candidate AMPs prior to the in vitro tests. In this study, we focused on the linear cationic peptides with non-hemolytic activity, which are downloaded from the Database of Antimicrobial Activity and Structure of Peptides (DBAASP). Referring to the MIC (Minimum inhibition concentration) values, we have assigned a positive label to a peptide if it shows antimicrobial activity; otherwise, the peptide is labeled as negative. Here, we focused on the peptides showing antimicrobial activity against Gram-negative and against Gram-positive bacteria separately, and we created two datasets accordingly. Ten different physico-chemical properties of the peptides are calculated and used as features in our study. Following data exploration and data preprocessing steps, a variety of classification algorithms are used with 100-fold Monte Carlo Cross-Validation to build models and to predict the antimicrobial activity of the peptides. Among the generated models, Random Forest has resulted in the best performance metrics for both Gram-negative dataset (Accuracy: 0.98, Recall: 0.99, Specificity: 0.97, Precision: 0.97, AUC: 0.99, F1: 0.98) and Gram-positive dataset (Accuracy: 0.95, Recall: 0.95, Specificity: 0.95, Precision: 0.90, AUC: 0.97, F1: 0.92) after outlier elimination is applied. This prediction approach might be useful to evaluate the antibacterial potential of a candidate peptide sequence before moving to the experimental studies.