2025 AIChE Annual Meeting

(588ch) Enhancing Polymer Design with Fine-Tuned Large Language Models: Bridging Chemical and Natural Languages

Large Language Models (LLMs) have exhibited remarkable capabilities in natural language processing and are now being adapted to address scientific challenges in specific domains. This research advances LLM applications by fine-tuning models like the GPT series using polymer-specific datasets, thereby enhancing their comprehension of chemical structures and properties. By integrating chemical languages such as SMILES (Simplified Molecular Input Line Entry System) with natural language descriptions, we bridge the gap between chemical and natural languages, demonstrating its capability to effectively recognize polymer patterns based on structural and nomenclature inputs. Multi-task fine-tuning enhances generalization compared to single-task approaches and shows significant accuracy improvements over in-context learning methods such as zero-shot and few-shot techniques. This strategy significantly boosts predictive accuracy and efficiency compared to general-purpose LLMs, paving the way for the design of novel polymer materials. Beyond polymers, this work suggests new possibilities for LLMs in scientific fields that require the integration of formal and natural language information.