Breadcrumb
- Home
- Publications
- Proceedings
- 2015 Synthetic Biology: Engineering, Evolution & Design (SEED)
- Poster Session
- Poster Session A
- Automated Statistical Design of Experiments for Metabolic Engineering
We approach this DoE challenge as a combinatorial optimization problem. The goal is to find a high target-producing strain out all potential strains in the pathway design space, while minimizing the total number of constructs which must be actually built and tested. The field of combinatorial optimization offers many algorithms for approaching such a problem. We propose an Estimation of Distribution Optimization (EDO) framework with an iterated regression model of the target metabolic pathway using proteomics and metabolomics data. This approach leverages multiple rounds of experiments to narrow in on the goal strain. In each round we first build a statistical model from the previous round’s omics data to predict target product production for any potential strain in the design space. We then sample from this model to generate the DoE for the next round. After each round, the model accuracy increases and is able to more effectively direct the next round of experiments.
Previous approaches to guide metabolic engineering show that statistical models are able to capture non-obvious interactions between pathway components, but fail to realize the potential of machine learning as a tool for automated DoE. Previous methods, which employ PCA and linear regression, are limited in their ability to capture arbitrary complex interactions in high-dimensional data sets and require significant manual interpretation. Machine learning algorithms allow us to train arbitrarily complex regression models and the EDO framework provides automated, statistically sound, DoE.
We test the effectiveness of this approach on several existing metabolic engineering data sets. Our toolkit predicts high-yield strains for different target natural products on distinct pathways and suggests additional experiments which are likely to yield even higher production levels. These predictions are computed without the need for manual interpretation, however, the toolkit also provides visualizations of high-dimensional omics data for qualitative representations of the pathway model.