2025 AIChE Annual Meeting

(588cr) Using Sequence Monte Carlo to Design Peptides with Optimized Properties

Membraneless organelles are cellular structures commonly formed through liquid liquid phase separation (LLPS) that are responsible for a number of cellular functions such as localization, compartmentalization, and filtration. These droplet-like formations are referred to as condensates and are primarily made up of intrinsically disordered proteins (IDPs). IDPs play an essential role in LLPS, and sequence-specific patterning parameters–such as sequence charge decoration (SCD) which measures the “blockiness” of charge in a sequence–have been shown to provide reasonable insights into a protein’s propensity for LLPS. This relationship is useful for determining the change in LLPS for protein variants, but their absolute values do not maintain significance when compared between proteins. This can be seen, for example, in the dichotomy between the wide range of possible SCD values held by different proteins and the extremely narrow range of values held within the ensemble of fixed-composition variants of a single protein.

In our work, we present a normalization scheme aimed at providing meaning to sequence-patterning parameters of any IDP by approximating the parameter standard deviation for a protein’s fixed-composition ensemble using composition-specific variables. For example, with SCD: chain length, fraction of charged residues (FCR), and net charge per residue (NCPR) can be used for any sequence to approximate the standard deviation of SCD among all fixed-composition variants. From here, a variant's parameter values can be contextualized by normalizing with respect to the derived standard deviation.

Utilizing this normalization scheme in tandem with the connection between patterning parameters and LLPS, we develop a Monte Carlo sampling algorithm to design protein variants with different LLPS propensities. In it, a starting sequence and set of desired values for sequence-specific parameters representing LLPS-capabilities are input by the user. Then, variants are randomly created and analyzed with respect to normalized deviations in parameter values from their goal values. Variants are accepted or rejected depending on whether they’ve moved closer to the desired sequence in this normalized parameter coordinate space. As such, this tool allows us to design protein variants with varying LLPS-tendencies at rates significantly faster than common methods such as extensive shuffling.