Abstract
Biopharmaceutical process development is often time-consuming and cost-intensive, due in large part to the complex and highly regulated nature of upstream (e.g., fermentation, in vitro transcription) and downstream (e.g., chromatography) operations. Although Quality by Design (QbD) and systematic modeling have helped reduce trial-and-error experimentation,[1] there remains a need for automated methods that can propose and evaluate novel process solutions. Recent advances in Generative AI—especially large language models (LLMs)—show promise for producing new “designs” yet generic models frequently lack core bioprocess constraints (e.g., enzyme kinetics, stoichiometric limits, safety margins). As a result, they can yield unfeasible or unsafe operating conditions. [2,3]
Recent research demonstrates the potential of domain-adapted and knowledge-augmented LLMs to enhance performance in specialized applications. For instance, Maharjan et al. highlights how prompt strategies can yield state-of-the-art performance on multiple medical benchmark datasets.[4] Similarly, Ahmed et al. applied tailored prompt engineering in healthcare contexts, enabling LLMs to predict medicine prescriptions from clinical notes and diagnose pneumonia using X-ray image analysis. [5]
In this work, we present a domain-focused generative AI framework centered on two key components: prompt engineering and fine-tuning with curated datasets. We begin by collecting relevant literature, patents and published data in digital text form (e.g., PDFs, structured CSVs, and equipment documentation). These texts are parsed and tokenized to enable fine-tuning of a pre-trained LLM, in our case open AI models. The tokens are then segmented into context windows that capture process-specific constraints (e.g., maximum column pressure, stoichiometric ratios) and examples of best practices. During fine-tuning, the model is exposed to authentic process examples highlighting acceptable operating ranges, typical yields, and engineering heuristics—thus embedding domain-specific knowledge into the model’s parameters. We store these examples as context windows that reveal how real processes balance practical considerations (e.g., column loading limits, pH stability, or enzyme saturation) against productivity and cost. We then measure performance (accuracy of parameter suggestions, alignment with known constraints) before and after the fine-tuning step, quantifying improvements via standardized metrics (e.g., ratio of infeasible outputs, average deviation from known optimal settings). Our prompt engineering phase provides additional structure during inference. We craft specialized prompts that remind the LLM of essential constraints and objectives. For instance, we prepend instructions (e.g., “Only propose temperature within the range of 10–25°C for this enzyme”), and we embed domain keywords (e.g., “ligand density,” “substrate saturation,” “resin lifetime”) to orient the LLM toward generating context-relevant outputs. We also tailor prompt length and format—ranging from short, high-level requests to multi-step instructions—to determine what style yields the most complete and accurate solutions.
We demonstrate the effectiveness of our prompt engineering + fine-tuning framework using two case studies. The first focuses on resin screening in downstream chromatography—an expensive bottleneck in many bioprocesses—where the fine-tuned LLM is guided by domain-specific prompts to propose feasible resin chemistries and operating strategies. We compare these suggestions against real data and observe improved plausibility (fewer unrealistic column capacities and incompatible ligand selections) relative to a baseline, non-fine-tuned LLM. The second case study targets mechanistic modeling for in vitro transcription, an upstream reaction often used to produce mRNA. We tasked the model with drafting fundamental equations that capture stoichiometric balances, enzyme kinetics, and buffer interactions. With carefully structured prompts and domain-focused fine-tuning, the model not only provided the standard T7 polymerase reaction equations, but also highlighted additional aspects—such as the formation and role of magnesium pyrophosphate—that the baseline model did not mention. This result illustrates how domain enrichment can lead to deeper mechanistic insights and guide more informative experimental design in bioprocess development. This presentation will discuss various such insights and will detail the technical methodology—including data pipelines for fine-tuning and strategies for prompt engineering—and comparisons of baseline versus domain-adapted LLM outputs. In future, we plan to integrate a symbolic AI layer for rule-based checks, allowing us to screen out any residual violations of fundamental principle before final acceptance.
Keywords: large language models, fine-tuning, prompt engineering, process development, chromatography, in vitro transcription, biomanufacturing
References
- Shahab, M. A., Destro, F., & Braatz, R. D. (2025). Digital Twins in Biopharmaceutical Manufacturing: Review and Perspective on Human-Machine Collaborative Intelligence. arXiv preprint arXiv:2504.00286.
- Venkatasubramanian, V., & Chakraborty, A. (2025). Quo vadis ChatGPT? From large language models to large knowledge models. Computers & Chemical Engineering, 192, 108895.
- Schweidtmann, A. M. (2024). Generative artificial intelligence in chemical engineering. Nature Chemical Engineering, 1(3), 193-193.
- Maharjan, J., Garikipati, A., Singh, N. P., Cyrus, L., Sharma, M., Ciobanu, M., ... & Das, R. (2024). OpenMedLM: prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models. Scientific Reports, 14(1), 14156.
- Ahmed, A., Hou, M., Xi, R., Zeng, X., & Shah, S. A. (2024, May). Prompt-Eng: Healthcare prompt engineering: Revolutionizing healthcare applications with precision prompts. In Companion Proceedings of the ACM Web Conference 2024(pp. 1329-1337).