A Predictive Modeling Frameowrk for Studying Disparity in Colorectal Screening
With the expanding use of Electronic Health Records (EHRs), the clinical data has been growing exponentially, which provides opportunities for the secondary use of health data that is collected at the point of care. There is an inherent benefit from the use of EHRs in management of chronic diseases such as colorectal cancer (CRC). CRC is the second leading cause for cancer related deaths in the United States and it is imperative to determine the consequences leading to the deaths. It has been investigated that the colorectal cancer rate varies according to the disparity that can include several factors like lifestyle, health insurance coverage, socio-economic status and educational status. Studies have also identified gender and race as a key cause of disparity with males and African Americans being at more risk when compared to the females and whites respectively. Recent studies that have evaluated the differences in the racial disparity of colorectal neoplasia are restricted to the non-screening populations, limited sample sizes, lack of histopathological diagnosis and a very few institution experiences. To the best of our knowledge, there are not any studies published that extract data from a large cohort that is regionally distributed and analyze factors by building predictive models associated with the racial and gender disparity with respect to healthcare utilization. This proposal aims to develop a colorectal cancer data repository by engaging participation from the Louisiana Clinical Data Research Network (LACDRN) from a regionally distributed population, and developing a predictive model by identifying the underlying key factors that are highly correlated with the gender and race. The proposed predictive model will then aid in computing the confidence associated with each patient on the risk of developing colon cancer. The results generated from the proposed study will lead us to advance in closing the loop between the established factors of race and gender as being covariates, along with other unidentified and underlying environmental and socio-economic factors from South Louisiana. While the proposed work is directed toward colorectal cancer, the methodological approach that will emerge from this project will have important implications more broadly. In particular, this methodological approach will extend to support Big Data-driven investigations into important health outcomes questions. The potential of Big Data for health outcomes research is far from fully realized, and will require more than solely gaining access to broader data resources. We will need to pursue important health outcomes research topics in a systematic, comprehensive way. The proposed framework will have the following algorithmic components as specific aims: Develop a modular data repository to collect data from the facilities that are a part of LACDRN; develop and employ machine-learning techniques to build predictive models for the factors of race, gender and the ICD-9 code that documents the farthest extension of tumor away from the primary site; integrate the predictive models developed in Specific aim-2 into a model for lifetime risk assessment and compute the confidence for each prediction deciphered by the model.
Principal Investigator: Dua, Prerna -- Engineering
|Start Period: 00/00/0000
||End Period: 00/00/0000
No Affiliated People