Antimicrobial Benchmark Dataset
Bacteria are developing resistant to antibiotics at an alarming rate. Antimicrobial peptides (AMPs) offer the potential for treating antibiotic resistance strains of bacteria. The mechanisms that AMPs employ to kill bacteria are not clearly understand and is currently an open problem in biology. Characterizing the structure(s) that AMPs undertake to perform their function promises to provide insight into these mechanisms. A popular set of features used to study AMPs has been the amino acid sequence of known AMP proteins. However, structure is a stronger indicator of function than the amino acid sequence alone. Our research group investigates how to exploit both sequence and structure of AMPs to characterize their behavior and to identify other peptides/proteins that exhibit similar behavior. In order to provide a dataset to serve as a benchmark for us and other groups, the links below provide two datasets to assist in applying machine learning methods to the study of AMPs. This work is from one of our undergradute research students Emma Macaluso.
- Negative Set (1613 PDBs)
- Positive Set (182 PDBs This is the first dataset we know of that provides both sequence and structure information for use in ML models to study AMP characteristics.