Identifying a feasible design space is a fundamental aspect of bioprocess design to ensure that the operating conditions enable desired process performance and product quality while adhering to regulatory and operational constraints. The design space is typically defined as the set of input conditions under which a system operates within acceptable performance limits [1]. In many bioprocesses, particularly those involving complex biological systems where a mechanistic model is unavailable, design space identification relies solely on experimental data. This presents significant challenges due to the inherent variability in biological systems, uncertainty in experimental measurements, and the high cost of conducting physical experiments.
Probabilistic design space identification extends conventional design space exploration by incorporating uncertainty quantification, thereby providing a probabilistic characterization of feasible regions rather than a deterministic characterization. In this approach, each point in the design space is associated with a probability of feasibility, reflecting the probability that it satisfies predefined operational criteria [2,3]. This approach is particularly valuable for bioprocesses, where stochasticity in experimental data is commonly encountered and must be accounted for when obtaining the design space. However, traditional experimental strategies often follow a sequential paradigm, conducting one experiment at a time, which limits efficiency. To address these challenges, this work presents a novel probabilistic design space identification framework that effectively utilizes parallel experimentation to enhance efficiency and robustness.
The proposed approach follows an iterative procedure wherein probabilistic design space estimation guides the selection of subsequent experimental points. In each iteration, multiple experiments are conducted in parallel, leveraging the capability of high-throughput experimentation in bioprocess development. Subsequently, a Gaussian Process (GP) model is trained based on the experimental data. As a result, the probabilistic nature of the design space is captured since the GP model is a probabilistic model, which is particularly suitable due to the inherent stochastic properties of the experimental data. The GP surrogate enables the estimation of the probability of feasibility across the design space, serving as a basis for adaptive sampling.
For adaptive sampling, we introduce a space-filling strategy that integrates Delaunay triangulation with probabilistic feasibility weighting. Specifically, each vertex of the triangulation is assigned a weight corresponding to its probability of feasibility as determined by the GP model. This approach ensures that experimental points are selected not only to maintain space-filling characteristics but also to prioritize regions of the design space with higher feasibility probabilities. By iteratively updating the GP model with newly acquired experimental data, the proposed framework progressively refines the probabilistic design space.
We evaluate the efficacy of the proposed approach through an in-silico case study involving a Continuous Stirred Tank Reactor (CSTR) system [3]. To realistically capture experimental variability, data is generated using high-fidelity simulations across multiple operating conditions, with duplicate simulations performed at each condition. These duplicates are obtained by randomly sampling the model parameters from their respective distributions while keeping the input conditions fixed, ensuring a realistic representation of stochastic system behaviour.
A comparative analysis against conventional sampling strategies, including Sobol and Latin Hypercube Sampling (LHS), demonstrates that the proposed adaptive sampling methodology significantly enhances design space accuracy for a given number of experiments. In each iteration of the process, 18 parallel experiments are conducted across 9 distinct operating conditions, followed by probabilistic design space identification and adaptive sampling which gives 9 new operating conditions for the next iteration. This iterative procedure continues until the mean absolute error (MAE) of the predicted probability of feasibility is reduced below 0.05. The results indicate that the proposed methodology achieves a 66% and 68% reduction in MAE compared to Sobol sampling and LHS methods, respectively, underscoring its superior predictive accuracy.
Furthermore, we benchmark our approach against batch Bayesian optimization, an extension of Bayesian optimization tailored for parallel function evaluations, assessing its performance across various acquisition functions. The findings demonstrate that our method provides a more precise and comprehensive characterization of the feasible design space while significantly reducing the number of required experiments. By integrating probabilistic modeling with adaptive sampling, the proposed framework effectively mitigates experimental costs while improving predictive reliability.
The contributions of this work are threefold. First, we present a framework for probabilistic design space identification in the absence of a mechanistic model, solely relying on experimental data. Second, we introduce an adaptive sampling strategy that leverages parallel experimentation and integrates space-filling properties with feasibility-based weighting. Third, we validate the proposed methodology through a realistic engineering case study, demonstrating superior performance over conventional sampling techniques. Given the increasing complexity of modern bioprocesses, this approach offers a practical and efficient solution for data-driven design space exploration for systems where the mechanistic understanding is limited.
Acknowledgments
The authors gratefully acknowledge funding from the Engineering and Physical Sciences Research Council U.K. (EP/X024156/1 and EP/W035006/1). Support from the UKRI Impact Acceleration Account (EP/X52556X/1) is also gratefully acknowledged.
References
- Kasemiire A, Avohou HT, De Bleye C, Sacre PY, Dumont E, Hubert P, Ziemons E. Design of experiments and design space approaches in the pharmaceutical bioprocess optimization. European Journal of Pharmaceutics and Biopharmaceutics. 2021 Sep 1;166:144-54.
- Kusumo KP, Gomoescu L, Paulen R, García Muñoz S, Pantelides CC, Shah N, Chachuat B. Bayesian approach to probabilistic design space characterization: A nested sampling strategy. Industrial & Engineering Chemistry Research. 2019 Nov 26;59(6):2396-408.
- Kucherenko S, Giamalakis D, Shah N, García-Muñoz S. Computationally efficient identification of probabilistic design spaces through application of metamodeling and adaptive sampling. Computers & Chemical Engineering. 2020 Jan 4;132:106608.
- Azimi J, Fern A, Fern X. Batch Bayesian optimization via simulation matching. Advances in neural information processing systems. 2010;23.