2025 AIChE Annual Meeting

(394ad) Learning Feasible Bioprocess Kinetics with Semi-Supervised VAEs for LCA

Authors

Stefanos Xenios - Presenter, National Technical University of Athens
Antonis Kokossis, National Technical University of Athens
Konstantinos Mexis, National Technical University of Athens
This work explores the development of large-scale kinetic models of metabolism through the application of generative artificial intelligence, with a focus on variational autoencoders (VAEs) to efficiently represent and sample from complex biochemical parameter spaces across varying conditions and mutant strains. The aim is to accelerate and systematize the generation of feasible genome-scale kinetic models by integrating AI-driven generative modeling with established mechanistic modeling frameworks.

Among the available modeling strategies, Monte Carlo-based kinetic modeling continues to be a powerful method for capturing the behavior of genome-scale networks while accounting for uncertainty in kinetic parameters[1]. However, due to the limited availability of experimentally measured parameters, particularly at large scales, the task of parametrizing such models remains highly challenging. We address this issue by integrating VAEs trained on model populations generated via the ORACLE framework[2], which inherently produces ensembles of kinetic models satisfying thermodynamic and physiological constraints.

The VAE learns a compressed latent representation of the parameter space, enabling both reconstruction of feasible models and guided sampling of new parameter sets with improved likelihood of biological validity. By embedding prior knowledge and omics data—such as fluxomics, metabolomics, and chemostat fermentation data—into the training pipeline, the VAE helps reduce uncertainty and narrows the sampling space for Monte Carlo simulations, thereby increasing the efficiency and quality of model generation. To further guide the learning process, we implemented a semi-supervised training strategy, where selected Key Performance Indicators (KPIs)—including model stability and experimental agreement scores—were used to label subsets of the training data. This enabled the VAE not only to learn a compressed latent representation of the kinetic parameter space, but also to bias model generation toward high-quality, experimentally consistent solutions [3].

A set of Key Performance Indicators (KPIs) was defined to evaluate model robustness and accuracy—specifically model stability, consistency with thermodynamic laws, agreement with experimental data, and predictive performance. These KPIs were used not only for assessing model quality, but also to guide the semi-supervised training of the VAE. By labeling a subset of the generated models according to their KPI performance, the VAE was able to learn and internalize the distinguishing features of high-quality models during training. This approach facilitates targeted exploration of the parameter space, allowing the generation of new kinetic models that are not only feasible but also aligned with experimentally validated behavior, supporting more reliable downstream strain and process optimization.

This integrated approach enhances the throughput and reliability of large-scale kinetic model development, embedding knowledge from both mechanistic constraints and experimental datasets. The framework was successfully applied to an E.coli strain and its various mutants [4]

This project has received funding from the Circular Bio-based Europe Joint Undertaking (JU) under the European Union’s Horizon Europe Research and Innovation Programme under Grant Agreement No. 101157528. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. Neither the European Union nor the Circular Bio-based Europe Joint Undertaking (JU).

[1] Chakrabarti, A., Miskovic, L., Soh, K. C., & Hatzimanikatis, V. (2013). Towards kinetic modeling of genome-scale metabolic networks without sacrificing stoichiometric, thermodynamic and physiological constraints. Biotechnology Journal, 8(9), 1043–1057.

[2] Miskovic, L., & Hatzimanikatis, V. (2011). Production of biofuels and biochemicals: in need of an ORACLE. Trends in Biotechnology, 29(6), 296–306.

[3] Kingma, D. P., & Welling, M. (2014). Auto-Encoding Variational Bayes. arXiv preprint, arXiv:1312.6114. Available at: https://arxiv.org/abs/1312.6114

[4] Ishii, N., Nakahigashi, K., Baba, T., Robert, M., Soga, T., Kanai, A., et al. (2007). Multiple high-throughput analyses monitor the response of E. coli to perturbations. Science, 316(5824), 593–597.