TY - GEN
T1 - Comparison of semantic and single term similarity measures for clustering Turkish documents
AU - Yücesoy, Bülent
AU - Öǧüdücü, Şule Gündüz
PY - 2007
Y1 - 2007
N2 - With the rapid growth of the World Wide Web (www), it becomes a critical issue to design and organize the vast amounts of on-line documents on the web according to their topic. Even for the search engines it is very important to group similar documents in order to improve their performance when a query is submitted to the system. Clustering is useful for taxonomy design and similarity search of documents on such a domain. Similarity is fundamental to many clustering applications on hypertext. In this paper, we will study how measures of similarity are used to cluster a collection of documents on a web site. Most of the document clustering techniques rely on single term analysis of text, such as vector space model. To better group of related documents we propose a new semantic similarity measure. We compare our measure with Wu-Palmer similarity and cosine similarity. Experimental results show that cosine similarity perform better than the semantic similarities. We demonstrate our results on Turkish documents. This is a first study that considers the semantic similarities between Turkish documents.
AB - With the rapid growth of the World Wide Web (www), it becomes a critical issue to design and organize the vast amounts of on-line documents on the web according to their topic. Even for the search engines it is very important to group similar documents in order to improve their performance when a query is submitted to the system. Clustering is useful for taxonomy design and similarity search of documents on such a domain. Similarity is fundamental to many clustering applications on hypertext. In this paper, we will study how measures of similarity are used to cluster a collection of documents on a web site. Most of the document clustering techniques rely on single term analysis of text, such as vector space model. To better group of related documents we propose a new semantic similarity measure. We compare our measure with Wu-Palmer similarity and cosine similarity. Experimental results show that cosine similarity perform better than the semantic similarities. We demonstrate our results on Turkish documents. This is a first study that considers the semantic similarities between Turkish documents.
UR - http://www.scopus.com/inward/record.url?scp=47349083904&partnerID=8YFLogxK
U2 - 10.1109/ICMLA.2007.32
DO - 10.1109/ICMLA.2007.32
M3 - Conference contribution
AN - SCOPUS:47349083904
SN - 0769530699
SN - 9780769530697
T3 - Proceedings - 6th International Conference on Machine Learning and Applications, ICMLA 2007
SP - 393
EP - 398
BT - Proceedings - 6th International Conference on Machine Learning and Applications, ICMLA 2007
T2 - 6th International Conference on Machine Learning and Applications, ICMLA 2007
Y2 - 13 December 2007 through 15 December 2007
ER -