2024 AIChE Annual Meeting

Principal Component Analysis in Prostate Cancer Research

Prostate cancer is the most common type of cancer in the United States, excluding skin cancer. Prostate cancer is also a very diverse cancer, which creates obstacles in regard to treatment, study, and prediction of the disease. Computational modeling is useful for finding information about the disease but can have barriers when it comes to price, time, and accessibility. When utilizing the large datasets found in prostate cancer studies, there is also a large computational cost. Principal Component Analysis aims to provide accurate results while lowering the barrier between the problem and the cost. PCA,here primarily using Python software, condenses large datasets while maintaining the overall variance.This lowers the computational burden. PCA is then combined with a feature selection model to try and extract valuable data. Gene expressions derived from prostate tumors contain information that can be run through a PCA and feature selection model. The model has the potential to reveal if some patients are more at risk for relapse of the disease than others. Up to thirty percent of men were found to relapse after prostate tumor removal, indicating an urgency for developing prediction methods. The tumors were removed from patients in various stages of prostate cancer, allowing for high variability of data. Some patients reached complete remission after removal while others unfortunately experienced reemergence of the disease. The study indicates that there is an underlying correlation between patients who relapse and abnormalities found in gene expressions. No singular gene has been found sufficient for prognosis, but using PCA and the full gene expression dataset can reveal related genes. PCA graphs the results in direction of most variation and exposes the most important features, or principal components. These principal components are used as axes and show the relationship between the original data points and principal components. This method can be coded in several computer languages, making PCA an accessible tool. The goal of this experiment is to show how PCA can be an accurate step in cancer diagnosis.