2025 AIChE Annual Meeting
(183e) Classification of Human Transcription Factors Based on Activation Domains Using Unsupervised Learning on Sequence Properties
Authors
We developed a machine learning-based approach to classify human transcription factors (HTFs) based on their effector domains which we named FALK22. To develop it, we analyzed sequences from 1,639 HTFs, optimize descriptors that capture sequence-dependent properties, and optimized hyperparameter spaces for classification. We also used the Evolutionary Scale Model (ESM) for classification and compared our feature space with embedding space generated by ESM for full-length HTFs. Using two independent unsupervised machine learning techniques, we identified two distinct classifications comprising 20 and 30 clusters based on regions outside the DBDs, each with unique patterns in amino acid composition and spacing. Coarse-grained simulations of full sequence TFs from our classification further grouped the sequences into three different classes of protein-protein interactions within the dense phase. This methodology provides a foundation for future research in transcriptional regulation, the effect of condensation in gene expression and its implications in human diseases.