5th Conference on Constraint-Based Reconstruction and Analysis (COBRA 2018)

Utilizing RNA-Seq Data in Bayesian Estimation of Gene Activity States

Authors

Eldon Sorensen - Presenter, Pacific Lutheran University
Jaden Boehme, Oregon State University
Angela Gasdaska, Emory University
Matthew DeJongh, Hope College
Aaron Best, Hope College
William Lindsey, Dordt College
Nathan Tintle, Dordt College
Recently, there has been interest in exploring how to infer gene activity states (e.g., whether a gene is active or inactive in a particular condition of interest) from genome-wide transcriptomic data. This knowledge is useful in many downstream applications, including the potentially improved use of transcriptomics data to improve flux predictions in metabolic models. Recently, a rigorous Bayesian approach (MultiMM) to classifying gene activity states was proposed that leverages a priori knowledge of operon structure as well as genome-wide transcriptomics data from multiple conditions in order to classify gene activity states. However, the MultiMM approach was developed for use on microarrays, and only evaluated on a very large set of over 900 E. coli arrays. Here, we extend the Bayesian model to RNA sequencing data and then evaluate its performance. Importantly, we evaluate performance in situations with both large (100s) and small (10s) of conditions, and provide intuition on necessary sample sizes for robust performance.