Metabolic Engineering X
Integration of Transcriptomic Data in Genome-Scale Metabolic Models Predicts in Vitro Intracellular Central Carbon Metabolic Fluxes with High Correlation in Escherichia coli and Saccharomyces Cerevisiae
Determination of systemic changes in intracellular metabolic fluxes of microorganisms is important to understand fundamental mechanisms of their metabolic responses as well as to identify molecular targets for metabolic engineering. The experimental quantification of intracellular metabolic fluxes is challenging not only because of the extensive instrumentation required for these methods but also because of the limited number of fluxes that can be measured.
One of the alternative methods that is widely used for system-level analysis of metabolic networks is a computational modeling approach called flux balance analysis (FBA), which predicts metabolic flux distributions at steady state by making use of in silico genome-scale metabolic models. Since these models are underdetermined in general, context-specific and physiologically meaningful flux solutions need to be narrowed down from the space of all possible distributions.
Transcriptomic data can be used to define flux bounds, objective functions, or both, to enhance the predictive power of in silico genome-scale metabolic models. Transcriptomic data has significant advantages for this purpose over other 'omics' platforms such as proteomics, metabolomics, and fluxomics because the layer of RNA transcripts is the only layer where a complete quantitative snapshot of all genome-wide molecular species is currently possible. In addition, RNA amount changes can be precisely measured in a highly automated process at a low cost compared to the amount of data gathered.
Because of these advantages of RNA transcript profiling, there have been previous studies to integrate transcriptomic data with in silico genome-scale metabolic models. However, previous methods have limitations such as the requirement of multiple input datatypes for analysis, discretization or binarization of gene expression measurement data according to the user-defined thresholds, or requirement for a priori knowledge of biomass production rate or carbon source. Moreover, the accuracy of the intracellular fluxes predicted by these methods have not been validated by measured intracellular fluxes.
In this study, we developed a computational tool for predicting intracellular metabolic flux distribution of E. coli and S. cerevisiae by integrating transcriptomic data with their in silico genomic-scale metabolic models.We suggest two different template models to be integrated with gene expression data, which are Yes Carbon source (YC) and No Carbon source (NC) models, and two kinds of optimization strategies, Yes Biomass (YB) and No Biomass (NB) strategies, which can be chosen and combined depending on the availability of knowledge on the carbon source or biomass flux. The YC model allows only the known carbon source to be taken up by the cell, while the NC model allows all carbon sources in the model to be taken up by the cell. The YB strategy tries to find a metabolic flux distribution that allows the cell to achieve maximum growth rate in an energy efficient way by solving a two-step of optimization problem: minimization of the Euclidean norm after maximizing biomass. The NB strategy can be applied when biomass flux is not a suitable objective function. It maximizes the Pearson correlation between metabolic flux and transcriptomic data. Transcriptomic data are used to constrain fluxes in the model for the YB strategy, while for the NB strategy, they are used to define the objective function.
The computational method and its validation process can be classified into 5 steps, which are 1) obtaining transcriptomic data measured under conditions in which we are interested, 2) obtaining in vitro fluxes data measured under exactly the same conditions as that of the transcriptomic data (for the purpose of comparing them with the predicted fluxes later), 3) integrating the transcriptomic data into the genome-scale metabolic model according to the model's gene-protein-reaction (GPR) association relationship, 4) solving FBA optimization problems with one of the two objective functions depending on the availability of information on biomass flux or carbon source, and 5) calculating correlation between the predicted fluxes and the measured fluxes.
On average, we observed good correlation between the predicted fluxes and the measured fluxes. The average uncentered Pearson correlation between predicted and measured fluxes was in the range of 0.7 to 0.9 in most cases, significantly outperforming existing methods. Our method was able to predict intracellular fluxes with an average uncentered Pearson correlation of around 0.6 even in the case where carbon source and objective function are unknown. It took less than 5 seconds to calculatethe flux distribution using our algorithm in all cases.
Aside from this predictive accuracy over competing methods, our method improves other limitations of the previous approaches described above in that it needs only one set of gene expression data as an input; it utilizes absolute gene expression data without arbitrary discretization of it; and it can be used when the objective function or the carbon source of an organism is not known; it produces a unique flux solution; the method is easy to implement. Our method would make it possible to identify fundamental mechanisms of metabolic responses and to find reasonable molecular targets for metabolic engineering in an efficient way. The present study may be extended to test for metabolism of other organisms, especially multi-cellular eukaryotic organisms.