2020 Virtual AIChE Annual Meeting
(35g) Development of Machine Learning Methods for Prediction of Microbial Metabolisms and Biosynthesis Performance (Invited Speaker)
Author
In the second case study, we integrated data-driven methods with genome scale metabolic model for assessment of microbial bio-production (yield, titer and rate). Using Escherichia coli or Yarrowia lipolytica as examples, we organized and curated data sets comprising recent biomanufacturing papers. We then augmented the features (e.g., product and substrate types, bioreactor conditions, and genetic background) extracted from literature with additional features derived from genome-scale model simulations. To alleviate the challenges of sparse data sets, data augmentation and ensemble learning were employed. The hybrid framework demonstrated a reasonably high cross-validation accuracy for prediction of cell factory performance metrics under presumed bioprocess and pathway conditions. These predictions could be: 1. used to assess and rank these influential factors on bio-production; 2. integrated with technoeconomic analysis for prior estimation of cell factories outcomes (i.e., serve as the useful risk assessment tool); or 3. employed in conjunction with genome scale modeling to improve computational design tools.