Performance analysis of Naïve Bayes classification, Support Vector Machines and Neural Networks for spam categorization

A. Cüneyd Tantuǧ*, Gülşen Eryiǧit

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

8 Citations (Scopus)

Abstract

Spam mail recognition is a new growing field which brings together the topic of natural language processing and machine learning as it is in essence a two class classification of natural language texts. An important feature of spam recognition is that it is a cost-sensitive classification: misclassification of a nonspam mail as spam is generally a more severe error than misclassifying a spam mail as non-spam. In order to be compared, the methods applied to this field should be all evaluated with the same corpus and within the same cost-sensitive framework. In this paper, the performances of Support Vector Machines (SVM), Neural Networks (NN) and Naïve Bayes (NB) techniques are compared using a publicly available corpus (LINGSPAM) for different cost scenarios. The training time complexities of the methods are also evaluated. The results show that NN has significantly better performance than the two other, having acceptable training times. NB gives better results than SVM when the cost is extremely high while in all other cases SVM outperforms NB.

Original languageEnglish
Title of host publicationApplied Soft Computing Technologies
Subtitle of host publicationThe Challenge of Complexity
EditorsAjith Abraham, Bernard Baets, Mario Koeppen, Bertram Nickolay
Pages495-504
Number of pages10
DOIs
Publication statusPublished - 2006

Publication series

NameAdvances in Soft Computing
Volume34
ISSN (Print)1615-3871
ISSN (Electronic)1860-0794

Fingerprint

Dive into the research topics of 'Performance analysis of Naïve Bayes classification, Support Vector Machines and Neural Networks for spam categorization'. Together they form a unique fingerprint.

Cite this