Breadcrumb
- Home
- Publications
- Proceedings
- 2009 Annual Meeting
- Computing and Systems Technology Division
- Data Analysis: Design, Algorithms & Applications
- (507f) Predicting In Vivo Toxicities Using Optimal Methods for Re-Ordering and Machine Learning
In this work, we combine the strengths of integer linear optimization (ILP) and machine learning for the prediction of in vivo toxicities of chemicals using only in vitro data. Our approach utilizes a biclustering method based on iterative optimal re-ordering [2,3] to identify biclusters corresponding to subsets of chemicals that have similar responses over distinct subsets of the in vitro assays. This enables us to determine subsets of the in vitro assays that are most likely to be correlated with toxicity in the in vivo data set. An optimal method based on integer linear optimization for re-ordering sparse data matrices [4] is then applied to the in vivo dataset (21.3% sparse) in order to cluster endpoints that have similar lowest effect level (LEL) values, where it is observed that the endpoints are effectively clustered according to (a) animal species and (b) similar physiological attributes. These clusters allow us to quantify the degree of toxicity of a chemical for various subsets of related animal assay endpoints. Based upon the clustering results of the in vitro and in vivo data sets, multi-class logistic regression is then utilized to (a) learn the correlation between the subsets of in vitro data and the in vivo responses, and (b) subsequently predict the toxicity signatures of the chemicals. Statistical analysis of our descriptors enables us to identify which in vitro assays are correlated with the prediction of specific in vivo endpoints. Our approach aims at finding the highest in vivo predictive ability using the minimum number of necessary in vitro descriptors.
[1] http://www.epa.gov/ncct/toxcast
[2] DiMaggio P.A., McAllister S.R., Floudas C.A., Feng X.J., Rabinowitz J.D., and H.A. Rabitz, "Biclustering via Optimal Re-ordering of Data Matrices in Systems Biology: Rigorous Methods and Comparative Studies", BMC Bioinformatics, 9, 458 (2008).
[3] DiMaggio P.A., McAllister S.R., Floudas C.A., Feng X.J., Rabinowitz J.D., and H.A. Rabitz, "A Network Flow Model for Biclustering via Optimal Re-ordering of Data Matrices", J. Global Opt., in press (2009).
[4] McAllister S.R., DiMaggio P.A., and C.A. Floudas, "Mathematical Modeling and Efficient Optimization Methods for the Distance-Dependent Rearrangement Clustering Problem", J. Global Opt., in press (2009).