Abstract
Identifying structure of genes in Human genomes highly depends upon accurate recognition of boundaries between exons and introns, i.e. splice sites. Hence, development of new methods for effective detection of splice sites is essential. DNA encoding approaches are used for feature extraction from gene sequences, while machine learning methods are used for classification of splice sites using those extracted features. This paper presents a new DNA encoding method based on triplet nucleotide encoding with the frequency difference between true and false splice site sequences (TN-FDTF). Then, Support Vector Machine (SVM), Artificial Neural Network (NN), Random Forest (RF) and AdaBoost classifiers are used for prediction of splice sites. The performance of the proposed method was assessed on Homo Sapiens Splice Site Dataset (HS3D) using 10 fold cross validation. The results showed that the AdaBoost outperformed all the considered classifiers. In addition, the proposed method achieved higher prediction accuracy than most of the current existing state of the art methods. It is believed that the proposed method can help to achieve better results in Human splice site recognition and eukaryotic gene detection.
Original language | English |
---|---|
Title of host publication | 2nd International Conference on Computer Science and Engineering, UBMK 2017 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 586-591 |
Number of pages | 6 |
ISBN (Electronic) | 9781538609309 |
DOIs | |
Publication status | Published - 31 Oct 2017 |
Externally published | Yes |
Event | 2nd International Conference on Computer Science and Engineering, UBMK 2017 - Antalya, Turkey Duration: 5 Oct 2017 → 8 Oct 2017 |
Publication series
Name | 2nd International Conference on Computer Science and Engineering, UBMK 2017 |
---|
Conference
Conference | 2nd International Conference on Computer Science and Engineering, UBMK 2017 |
---|---|
Country/Territory | Turkey |
City | Antalya |
Period | 5/10/17 → 8/10/17 |
Bibliographical note
Publisher Copyright:© 2017 IEEE.
Keywords
- DNA encoding methods
- Gene detection
- Machine learning
- Splice site prediction