The ability to predict changes in protein binding affinity upon mutation (ddg) is an important challenge for protein design. Previous work has demonstrated that the physical features of protein interfaces can be predictive of binding poses. Using a random forest classifier trained on the expected persistent pairwise interaction (EPPI) features, false poses that initially seemed computationally feasible could be eliminated. Building on this, we hypothesized that the EPPI feature space could be extended to other facets of protein design, such as predicting experimental upon mutations at protein interfaces. To test this, we extracted the change in EPPI features from protein interface mutations in the SKEMPI v2.0 database and trained a random forest regressor for prediction.
The ensemble model, which uses an ensemble average of the change in EPPI features, achieved state-of-the-art performance, with a Pearson correlation of 0.723 during cross-validation and 0.716 on a blind validation set. Additionally, a faster model, using only a single state for the mutated structure performed comparably to the previous state of the art method, flex_ddg, in two orders of magnitude less time. Notably, this approach relies only on efficiently computed structural features which eliminates the need for costly webservers, molecular dynamics simulations, ensemble generation, and deep learning architectures. Finally, the models demonstrated an ability to generate better predictions with the addition of limited experimental data. For some unseen complexes, as little as 1 or 2 experimental datapoints improved the model’s performance in both correlation and mean absolute error. This transfer learning performance suggests an application in efficient screening of unseen complexes, making use of the quick retraining process and iterative experimental characterization to carefully and confidently select mutations that result in the desired effect.