Breadcrumb
- Home
- Publications
- Proceedings
- 2025 AIChE Annual Meeting
- Poster Sessions
- General Poster Session
- (588bo) ProcedureT5: Enhanced Experimental Procedure Prediction with Pre-Training and Data Augmentation
In this work, we introduce ProcedureT5, an approach that integrates chemistry-oriented pre-trained models with augmented multi-source datasets to enhance the prediction of experimental procedures across broader scenarios. Our method achieves state-of-the-art performance on the Pistachio dataset - a collection of reaction procedures derived from US patent literature, showing a 4-point increase in BLEU score and a 34% improvement in exact-match accuracy compared to existing methods. Additionally, we curate a small expert-annotated dataset, Orgsyn, consisting of verified organic synthesis procedures, to assess the model’s performance in more diverse applications. Fine-tuning ProcedureT5 on the Orgsyn dataset demonstrates its adaptability, yielding a BLEU score of 41.19 and an average similarity of 50.58%. This work underscores the crucial role of ProcedureT5 in bridging the gap between computational synthesis planning and practical laboratory implementation.
Reference
(1) Jiang, Y.; Yu, Y.; Kong, M.; Mei, Y.; Yuan, L.; Huang, Z.; Kuang, K.; Wang, Z.; Yao, H.; Zou, J.; Coley, C. W.; Wei, Y. Artificial Intelligence for Retrosynthesis Prediction. Engineering 2023, 25, 32–50. https://doi.org/10.1016/j.eng.2022.04.021.
(2) Gao, H.; Struble, T. J.; Coley, C. W.; Wang, Y.; Green, W. H.; Jensen, K. F. Using Machine Learning To Predict Suitable Conditions for Organic Reactions. ACS Cent. Sci. 2018, 4 (11), 1465–1476. https://doi.org/10.1021/acscentsci.8b00357.
(3) Hua, P.-X.; Huang, Z.; Xu, Z.-Y.; Zhao, Q.; Ye, C.-Y.; Wang, Y.-F.; Xu, Y.-H.; Fu, Y.; Ding, H. An Active Representation Learning Method for Reaction Yield Prediction with Small-Scale Data. Commun Chem 2025, 8 (1), 1–12. https://doi.org/10.1038/s42004-025-01434-0.
(4) Vaucher, A. C.; Zipoli, F.; Geluykens, J.; Nair, V. H.; Schwaller, P.; Laino, T. Automated Extraction of Chemical Synthesis Actions from Experimental Procedures. Nat Commun 2020, 11 (1), 3601. https://doi.org/10.1038/s41467-020-17266-6.
(5) Vaucher, A. C.; Schwaller, P.; Geluykens, J.; Nair, V. H.; Iuliano, A.; Laino, T. Inferring Experimental Procedures from Text-Based Representations of Chemical Reactions. Nat Commun 2021, 12 (1), 2573. https://doi.org/10.1038/s41467-021-22951-1.
(6) Liu, Z.; Shi, Y.; Zhang, A.; Li, S.; Zhang, E.; Wang, X.; Kawaguchi, K.; Chua, T.-S. ReactXT: Understanding Molecular “Reaction-Ship” via Reaction-Contextualized Molecule-Text Pretraining. arXiv May 23, 2024. https://doi.org/10.48550/arXiv.2405.14225.