Abstract
Single Nucleotide Polymorphisms (SNPs) are point mutations of DNA that play an important role in understanding genetic diseases. Determining whether a SNP has the potential to cause disease is a complex and crucial task. Several guidelines exist for the evaluation of the pathogenicity of genomic variation. The ACMG/AMP criteria list is among the widely accepted frameworks. These guidelines consider various biological and clinical features of a variant. Among these, in silico tools-computer-based algorithms that predict variant pathogenicity-play an important role. Various predictor algorithms have been developed in recent years to determine the pathogenicity of variants. However, experiments indicated limited concordance and accuracy among these algorithms. This study compares tree-based machine learning models, including Decision Tree, Random Forest, XGBoost, and CatBoost, for predicting SNP pathogenicity using a benchmark dataset from the ClinVar human variation archive. Our methodology includes the integration of several in silico prediction scores to improve general prediction accuracy. Our experiments demonstrate that combining different tools' predictions with a proper machine learning model, like CatBoost, performs better than stand-alone predictors with an AUPRC of 98.1%. These results indicate that pathogenicity assessment can be very strongly improved by machine learning methods, providing a more reliable tool for genetic disease research and diagnosis. The study underlines the power of combining traditional in silico methods with machine learning models to gain better predictive outcomes.
Original language | English |
---|---|
Title of host publication | UBMK 2024 - Proceedings |
Subtitle of host publication | 9th International Conference on Computer Science and Engineering |
Editors | Esref Adali |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 689-694 |
Number of pages | 6 |
ISBN (Electronic) | 9798350365887 |
DOIs | |
Publication status | Published - 2024 |
Event | 9th International Conference on Computer Science and Engineering, UBMK 2024 - Antalya, Turkey Duration: 26 Oct 2024 → 28 Oct 2024 |
Publication series
Name | UBMK 2024 - Proceedings: 9th International Conference on Computer Science and Engineering |
---|
Conference
Conference | 9th International Conference on Computer Science and Engineering, UBMK 2024 |
---|---|
Country/Territory | Turkey |
City | Antalya |
Period | 26/10/24 → 28/10/24 |
Bibliographical note
Publisher Copyright:© 2024 IEEE.
Keywords
- in silico tools
- machine learning
- pathogenicity
- SNP
- variant interpretation