2024 AIChE Annual Meeting
(372ad) Leveraging Large Language Models for the Identification, Annotation, and Substrate Prediction of Transporter Proteins
Transporter proteins are indispensable for their roles in cellular membranes, facilitating essential biological functions such as molecule transport, nutrient uptake, ion homeostasis, and drug interactions.1 Despite their critical importance, transporter proteins remain underexplored and challenging to annotate due to their diversity, complex structures, and nuanced functions. This study presents a computational framework designed to overcome these obstacles and improve the identification, annotation, and characterization of transporter proteins by leveraging deep learning and pre-trained large language models. This framework encompasses three key machine learning models: (1) A binary classification model that identifies transporter proteins, (2) A prototypical model for Transporter Classification (TC) family prediction for detailed annotation of previously uncharacterized transporters, (3) A substrate specificity prediction model that accurately identifies the molecule(s) transported by these proteins. For the transporter identification task, our model showed the accuracy of 0.956, surpassing the previously reported methods.2 Our TC family prediction identification model has demonstrated comparable results with the state-of-the-art tool named TC-blast, specifically for underrepresented TC families.3 and lastly, our substrate specificity prediction model reached the accuracy of 0.931 on our own accurately curated dataset, outperforming existing ML based approaches.4
References:
- David, R., Byrt, C. S., Tyerman, S. D., Gilliham, M., & Wege, S. (2019). Roles of membrane transporters: connecting the dots from sequence to phenotype. Annals of Botany, 124(2), 201–208.
- Wang, Q., Xu, T., Xu, K., Lu, Z., & Ying, J. (2023). Prediction of transport proteins from sequence information with the deep learning approach. Computers in Biology and Medicine, 160, 106974.
- Saier, M. H., Reddy, V. S., Moreno-Hagelsieb, G., Hendargo, K. J., Zhang, Y., Iddamsetty, V., Lam, K. J. K., Tian, N., Russum, S., Wang, J., & Medrano-Soto, A. (2021). The Transporter Classification Database (TCDB): 2021 update. Nucleic Acids Research, 49, D461–D467.
- Kroll, A., Niebuhr, N., Butler, G., & Lercher, M. J. (2023). A general prediction model for substrates of transport proteins. bioRxiv, https://doi.org/10.1101/2023.10.31.564943