In living cells, proteins and other biomolecules undergo spontaneous phase separation to form biomolecular condensates. One direct consequence of the biomolecular phase separation is the formation of membraneless microcompartments, where biomolecules are selectively enriched. Inspired by this feature, biomolecular condensates have been suggested to be a promising new engineering tool to control biochemical reactions in living cells. By engineering enzymes to undergo phase separation in living cells, these enzymes will be enriched inside condensates and, thus, increase the rates of reactions catalyzed by these enzymes. Our past study [1] has shown that this is possible as long as the catalytic rate of an enzyme is increased inside condensates. At the same time, we found out that, through mathematical modeling, reaction rates can be further modulated by the enrichment of reaction substrates inside condensates. Specifically, by increasing local substrate concentration, phase separation can increase the reaction rate. Also, the substrate enrichment will reduce the availability of the substrate from an enzyme regulating a competing reaction, which increases the relative yield as well. However, the mechanisms governing the substrate enrichment in biomolecular condensates are not fully elucidated yet. While Thody et al. [2] and Kilgore et al. [3] independently measured the enrichment of small molecules inside condensates formed by six proteins, we do not have comprehensive understanding on rules governing the enrichment of small molecules in condensates.
Motivated by this, this study proposes a transfer learning approach to construct a machine learning (ML) model that predicts the enrichment of a small molecule in condensates formed by a protein. To overcome the issue of limited size of available experimental measurements, we will leverage two biomolecular foundation models, ESM-2 [4]and Molformer [5] to represent proteins and small molecules, respectively. Since these foundation models have already “learned” millions of proteins and small molecules, the use of these models will increase the generalizability of predictive power when their embeddings are used for training a ML model. Hence, we will compute the latent embeddings of proteins and small molecules from these foundation models. Subsequently, a standard multilayer perceptron model will be used to predict the enrichment factor given embeddings of a protein and a small molecule in the available experimental measurements. The predictive accuracy of the developed model will be tested by incubating E. coli, which expresses a protein undergoing phase separation, a fluorescent small molecule.
References
1. D. Lee, M. Walls, K. Siu, Y. Dai, K. Xu, C. Brangwynne, A. Chilkoti, J. Avalos and L. You, "Principles of metabolic pathway control by biomolecular condensates in cells," Nature Chemical Engineering, 2025.
2. S. A. Thody, H. D. Clements, H. Baniasadi, A. S. Lyon, M. S. Sigman and M. K. Rosen, "Small Molecule Properties Define Partitioning into Biomolecular Condensates," Nature Chemistry, vol. 16, pp. 1794-1802, 2024
3. H. Kilgore, P. Mikhael, K. Overholt, A. Boija, N. Hannett, C. Van Dongen, T. Lee, Y. Chang, R. Barzilay and R. Young, "Distinct chemical environments in biomolecular condensates," Nature Chemical Biology, pp. 1-11, 2023.
4. Z. Lin, H. Akin, R. Rao, B. Hie, Z. Zhu, W. Lu, N. Smetanin, R. Verkuil, O. Kabeli, Y. Shmueli and A. dos Santos Costa, "Evolutionary-scale prediction of atomic-level protein structure with a language model," Science, vol. 379, no. 6637, pp. 1123-113, 2023.
5. J. Ross, B. Belgodere, V. Chenthamarakshan, I. Padhi, Y. Mroueh and P. Das, "Large-scale chemical language representations capture molecular structure and properties," Nature Machine Intelligence, vol. 4, no. 12, pp. 1256-1264, 2022.