Breadcrumb
- Home
- Publications
- Proceedings
- 2016 AIChE Annual Meeting
- Food, Pharmaceutical & Bioengineering Division
- Omics and High-Throughput Technologies
- (740c) A Multi-Omic Classification Strategy for Incorporating Incomplete Datasets
We constructed a multi-omics tree-based classification strategy to evaluate the effects of incorporating missing data, along with multiple data integration strategies. We evaluated the performance of our multi-omic classifier in multiple TCGA datasets that contained samples with missing data for a single data type. We compared each classifierâ??s ability to predict survival, treatment response and/or disease subtype, based on available clinical data. We compared the cross-validated classification performance of an identical classifier applied to both the full dataset (containing missing data) and the partial dataset containing only samples for which all data is available.Using this strategy, classification performance improves with incorporation of incomplete samples. Additionally, we systematically evaluated the effect of the fraction of incomplete data by simulating missing data from each dataset. We used these simulations to determine the threshold of missing data that can be tolerated without loss of performance. Based on results from multiple cancer datasets, the proposed multi-omic classification strategyprovides an efficient method for preserving statistical power in multi-omic biomarker studies with incomplete data.