2025 AIChE Annual Meeting
(87d) Predicting Small Molecule Enrichment in Biomolecular Condensates through Transfer Learning
Authors
Motivated by this, this study proposes a transfer learning approach to construct a machine learning (ML) model that predicts the enrichment of a small molecule in condensates formed by a protein. To overcome the issue of limited size of available experimental measurements, we will leverage two biomolecular foundation models, ESM-2 [4]and Molformer [5] to represent proteins and small molecules, respectively. Since these foundation models have already “learned” millions of proteins and small molecules, the use of these models will increase the generalizability of predictive power when their embeddings are used for training a ML model. Hence, we will compute the latent embeddings of proteins and small molecules from these foundation models. Subsequently, a standard multilayer perceptron model will be used to predict the enrichment factor given embeddings of a protein and a small molecule in the available experimental measurements. The predictive accuracy of the developed model will be tested by incubating E. coli, which expresses a protein undergoing phase separation, a fluorescent small molecule.
References
1. D. Lee, M. Walls, K. Siu, Y. Dai, K. Xu, C. Brangwynne, A. Chilkoti, J. Avalos and L. You, "Principles of metabolic pathway control by biomolecular condensates in cells," Nature Chemical Engineering, 2025.
2. S. A. Thody, H. D. Clements, H. Baniasadi, A. S. Lyon, M. S. Sigman and M. K. Rosen, "Small Molecule Properties Define Partitioning into Biomolecular Condensates," Nature Chemistry, vol. 16, pp. 1794-1802, 2024
3. H. Kilgore, P. Mikhael, K. Overholt, A. Boija, N. Hannett, C. Van Dongen, T. Lee, Y. Chang, R. Barzilay and R. Young, "Distinct chemical environments in biomolecular condensates," Nature Chemical Biology, pp. 1-11, 2023.
4. Z. Lin, H. Akin, R. Rao, B. Hie, Z. Zhu, W. Lu, N. Smetanin, R. Verkuil, O. Kabeli, Y. Shmueli and A. dos Santos Costa, "Evolutionary-scale prediction of atomic-level protein structure with a language model," Science, vol. 379, no. 6637, pp. 1123-113, 2023.
5. J. Ross, B. Belgodere, V. Chenthamarakshan, I. Padhi, Y. Mroueh and P. Das, "Large-scale chemical language representations capture molecular structure and properties," Nature Machine Intelligence, vol. 4, no. 12, pp. 1256-1264, 2022.