GenoCare Prognosticator Model: Host Genetics Predict Severity of Infectious Disease
Main Article Content
Abstract
Scientific community understanding of the variance in severity of infectious disease like COVID-19 across patients is an important area of focus. The article presents an innovative voting ensemble GenoCare Prognosticator (GCP) model that incorporates XGBoost and Random Forest classifiers, two cutting-edge machine learning approaches. A large dataset that incorporates medical covariates like gender and age along with biological WES (Whole Exome Sequencing) data was used to train these models. Five-fold stratified cross-validation was used to process the dataset in order to improve model stability and avoid overfitting. Two medical covariates and sixteen recognized candidate gene variants were among the eighteen major features on which our GCP model had been verified using data from earlier studies. Specific post-hoc clarification of the model’s predictions was provided by ExplainerDashboard, a Python open-source library, to improve interpretability. Furthermore, we utilized OpenTarget and Enrichr, two bioinformatic resources, to establish connections between the discovered variations in genetics and pertinent ontologies, biological pathways, and possible drug/disease relationships. Unsupervised clustering of SHAP key feature values was included in the analysis, which revealed intricate genetic interactions that affect the severity of the disease. Our results show that although gender and age are the main factors influencing the severity of COVID-19, complex genetic interactions cause severe symptoms in a specific subset of patients. This work contributes to our comprehension of the biological variables influencing the severity of COVID-19 and offers a reliable, comprehensible model that can help recognize patients at high risk and guide individualized treatment plans.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.