2025 AIChE Annual Meeting

(276f) Therascript: A Data-Efficient Foundation Model for Predicting mRNA Translation and Stability from UTR Sequences

Authors

Mae M. Lewis, Duke University
Giovanni Traverso, Brigham and Women's Hospital
Deep learning has enabled predictive modeling of mRNA translation from 5′ UTRs, but current state-of-the-art models, such as Optimus 5-Prime, require hundreds of thousands of labeled sequences, limiting their accessibility and scalability. We present a data-efficient foundation model that achieves comparable predictive accuracy to Optimus 5-Prime using less than 1% of its training data. Our approach leverages parameter-efficient architectures and unsupervised sequence pretraining to model sequence-to-translation relationships with minimal supervision. In a major extension beyond translation, we demonstrate that the same model can predict mRNA stability from 3′ UTR sequence features, enabling dual-objective optimization of gene expression. This unified model provides an end-to-end predictive framework for both protein production and transcript longevity, two critical dimensions of therapeutic mRNA and synthetic biology design. Our results offer a scalable, cost-effective path toward general-purpose RNA regulatory models and suggest a paradigm shift in how functional sequence-labeling data is leveraged for biological model training.