Utilizing Tree-Based Algorithms for Genetic Variant Interpretation

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Single Nucleotide Polymorphisms (SNPs) are point mutations of DNA that play an important role in understanding genetic diseases. Determining whether a SNP has the potential to cause disease is a complex and crucial task. Several guidelines exist for the evaluation of the pathogenicity of genomic variation. The ACMG/AMP criteria list is among the widely accepted frameworks. These guidelines consider various biological and clinical features of a variant. Among these, in silico tools-computer-based algorithms that predict variant pathogenicity-play an important role. Various predictor algorithms have been developed in recent years to determine the pathogenicity of variants. However, experiments indicated limited concordance and accuracy among these algorithms. This study compares tree-based machine learning models, including Decision Tree, Random Forest, XGBoost, and CatBoost, for predicting SNP pathogenicity using a benchmark dataset from the ClinVar human variation archive. Our methodology includes the integration of several in silico prediction scores to improve general prediction accuracy. Our experiments demonstrate that combining different tools' predictions with a proper machine learning model, like CatBoost, performs better than stand-alone predictors with an AUPRC of 98.1%. These results indicate that pathogenicity assessment can be very strongly improved by machine learning methods, providing a more reliable tool for genetic disease research and diagnosis. The study underlines the power of combining traditional in silico methods with machine learning models to gain better predictive outcomes.

Original languageEnglish
Title of host publicationUBMK 2024 - Proceedings
Subtitle of host publication9th International Conference on Computer Science and Engineering
EditorsEsref Adali
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages689-694
Number of pages6
ISBN (Electronic)9798350365887
DOIs
Publication statusPublished - 2024
Event9th International Conference on Computer Science and Engineering, UBMK 2024 - Antalya, Turkey
Duration: 26 Oct 202428 Oct 2024

Publication series

NameUBMK 2024 - Proceedings: 9th International Conference on Computer Science and Engineering

Conference

Conference9th International Conference on Computer Science and Engineering, UBMK 2024
Country/TerritoryTurkey
CityAntalya
Period26/10/2428/10/24

Bibliographical note

Publisher Copyright:
© 2024 IEEE.

Keywords

  • in silico tools
  • machine learning
  • pathogenicity
  • SNP
  • variant interpretation

Fingerprint

Dive into the research topics of 'Utilizing Tree-Based Algorithms for Genetic Variant Interpretation'. Together they form a unique fingerprint.

Cite this