Transporter proteins are indispensable for their roles in cellular membranes, facilitating essential biological functions such as molecule transport, nutrient uptake, ion homeostasis, and drug interactions.
1 Despite their critical importance, transporter proteins remain underexplored and challenging to annotate due to their diversity, complex structures, and nuanced functions. This study presents a computational framework designed to overcome these obstacles and improve the identification, annotation, and characterization of transporter proteins by leveraging deep learning and pre-trained large language models. This framework encompasses three key machine learning models: (1) A binary classification model that identifies transporter proteins, (2) A prototypical model for Transporter Classification (TC) family prediction for detailed annotation of previously uncharacterized transporters, (3) A substrate specificity prediction model that accurately identifies the molecule(s) transported by these proteins. For the transporter identification task, our model showed the accuracy of 0.956, surpassing the previously reported methods.
2 Our TC family prediction identification model has demonstrated comparable results with the state-of-the-art tool named TC-blast, specifically for underrepresented TC families.
3 and lastly, our substrate specificity prediction model reached the accuracy of 0.931 on our own accurately curated dataset, outperforming existing ML based approaches.
4
References:
- David, R., Byrt, C. S., Tyerman, S. D., Gilliham, M., & Wege, S. (2019). Roles of membrane transporters: connecting the dots from sequence to phenotype. Annals of Botany, 124(2), 201–208.
- Wang, Q., Xu, T., Xu, K., Lu, Z., & Ying, J. (2023). Prediction of transport proteins from sequence information with the deep learning approach. Computers in Biology and Medicine, 160, 106974.
- Saier, M. H., Reddy, V. S., Moreno-Hagelsieb, G., Hendargo, K. J., Zhang, Y., Iddamsetty, V., Lam, K. J. K., Tian, N., Russum, S., Wang, J., & Medrano-Soto, A. (2021). The Transporter Classification Database (TCDB): 2021 update. Nucleic Acids Research, 49, D461–D467.
- Kroll, A., Niebuhr, N., Butler, G., & Lercher, M. J. (2023). A general prediction model for substrates of transport proteins. bioRxiv, https://doi.org/10.1101/2023.10.31.564943