Breadcrumb
- Home
- Publications
- Proceedings
- 2014 AIChE Annual Meeting
- Food, Pharmaceutical & Bioengineering Division
- Protein Structure, Function, and Stability I
- (526h) A Novel Approach for Protein Structure Prediction
The approach is hierarchical in nature, represents an update to the framework previously developed by our group [2-5], and includes improved methods for the prediction of secondary structure [3], protein domain definitions, Cβ-tertiary contacts [4], beta-sheet topology [5], tertiary structure prediction, and refinement [6]. Secondary structure prediction is based on an SVM model that takes as features the predictions of 4 published methods. The SS prediction model consists of 3 one-vs-all binary classifiers that have been combined to maximize the prediction of helices and strands. The β-sheet topology method has three components: (i) SVM model for β-contact prediction; (ii) MILP model for strand pair alignment; and (iii) MILP model for β-sheet topology. SVM predicted β-contact probabilities are used as input for the strand pair MILP model, which produces a rank-ordered list of optimal strand pair alignments, for each possible pair of β-strands. A rank-ordered list of optimal β-sheet topologies is ultimately generated based on the optimal strand pair alignments. The prediction of tertiary Cβ-contacts is based on the Delaunay triangulation of the Cβ coordinates of structural templates, and a consensus score based on the template Z-scores is used to rank observed contacts.
The presented approach contains 2 pipelines for template-based tertiary structure prediction, consensus-based template identification and biclustering-based template identification, as well as, ab initio structure prediction using grey-box global optimization using ARGONAUT. The consensus-based template identification uses consistency with predicted secondary structure, consistency with predicted beta topology, and consistency with predicted Cβ contacts to re-rank and select structural templates. Biclustering-based template identification utilizes clustering of templates (columns) according to extracted pairwise distances (rows) in combination with manual clustering of these distances using an alignment confidence derived from the position-specific scoring matrix. The biclustering helps identify consistency in extracted template distances while allowing structural variability between regions with less confident alignments. The final row and column clusters represent sets of similar residue-residue distances with consistency across several templates, and is used to identify a single template which best fits these consensus distances. For ab initiostructure prediction, initial models are first generated based on several subsets of the predicted Cβ and β-sheet contacts. Subsequently, ARGONAUT is used to fit and optimize surrogate models to minimize a structure-based objective function according the pairwise-distances of predicted residue contacts. Solutions of the surrogate models are added to the set of predicted structures, and used in the next iterations of model fitting and optimization.
The final step of the approach is the refinement of the generated structural models using Princeton_TIGRESS, as well as a novel molecular dynamics-based refinement method. We present results for the benchmarking of the individual methods, as well as our overall results from the CASP11 competition.