Scaling up biological processes with confidence requires a consistent and reliable cell metabolism model. Often at odds with developing a robust model is the fact that experiments are time-consuming and expensive, with knowledge gaps insufficiently addressed even when using quality by design approaches. For these reasons, it is essential to understand the quality of models we can expect from limited data, the amount of data required to build accurate metabolism models, and the sensitivity implications of model choices for simulations of scaled-up processes.
In this study, we investigate 3 popular classes of cell metabolism models: (1) equation-based mechanistic models with unknown parameters; (2) hybrid models leveraging ML to learn the error on mechanistic models; and (3) NeuralODEs. Each class of model is trained on both a large synthetic dataset derived from a mechanistic CHO model and a process development dataset from a CHO-GS cell line. We progressively introduce more training data to the models (from <5 runs to >20 runs) and report the consequences on model trainability an accuracy. We finally perform in silico scale-up studies using a typical fed-batch process to illustrate the extrapolation potential and sensitivity of each type of model and each training set size to scale-dependent process conditions.