2023 AIChE Annual Meeting
Development of Machine-Learning Models for the Prediction of Membrane Active Peptides with Cell-Penetrating Capabilities Based on Amino Acid Periodicities
Recently, membrane active peptides have demonstrated promise as alternatives to conventional small molecule-based therapeutics. Upon interacting with the cell membrane, these peptides exert their activity either by translocating through it to deliver cargo or to disrupt it resulting in cell lysis. Specifically, cell penetrating peptides (CPPs) pass through the cell membrane, transporting bioactive molecules such as plasmid DNA, nanoparticles, imaging agents for disease diagnostics, other peptides, and proteins. CPPs are composed of less than 30 amino acids and are cationic, amphipathic, and hydrophobic in nature. All these properties make them ideal for both carrying cargo and interacting with cell environments without disruption. However, experimental design of CPPs is time consuming and expensive in the wet lab. Therefore, computational prediction of their activity is highly desirable for most efficient experimental design. To this aim, we have developed and trained support vector machine models (SVMs) to discriminate between active and nonactive CPPs. MCLPP 2.0 datasets have been used which contain both training and independent test sets. We utilized the Blocks Substitution Matrix (BLOSUM100) to first vectorize the sequences. It has been previously shown that oscillations of amino acid periodicities can categorize structure and function of peptides. Therefore, we applied the Fourier Transform to each of the generated BLOSUM100-based representative vectors to measure the amplitude of the amino acid periodicities. This resulted in 160 frequency descriptions (Fourier transform coefficients) for each peptide which served as input features to our SVM models. We performed 10-fold cross validation to train and tune the models based on 10 training testing sets. After assigning best hyperparameters, we carried out a feature selection algorithm based on certain criteria to rank the features. This subsequently allowed us to determine the most predictive features contributing to cell penetrating activity of peptides. We then exposed our models to the blind test sets utilizing the best features found and evaluated the modelsâ performance using various metrics. Furthermore, we have incorporated other sets of physicochemical features and properties of amino acids into our models to try our prediction accuracy. Upon comparison to state-of-the-art studies, our novel featurization and feature selection methods produced promising results.
Keywords: Cell Penetrating Peptides, Machine Learning, Support Vector Machines, Feature Selection, BLOSUM100