A taxonomy based semantic similarity of documents using the cosine measure

Ainura Madylova*, Şule Gündüz Öǧüdücü

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

31 Citations (Scopus)

Abstract

In this paper, we present a new method for calculating semantic similarities between documents. This method is based on cosine similarity calculation between concept vectors of documents obtained from a taxonomy of words that captures IS-A relations. The calculation of semantic similarities between documents is a very time consuming task, since it is necessary first to calculate semantic similarities between each pair of words that appear on different documents. In this paper, we present a new method to calculate semantic similarities between documents which results in faster computational time. Both a taxonomy based semantic similarity and cosine similarity are employed. First, the concept vectors of documents are obtained by extending the terms in the document vectors with their corresponding IS-A concepts. Cosine similarity is then calculated between those concept vectors of documents. Thus, the overall similarity between documents is a combination of cosine similarity and semantic similarity. The proposed semantic similarity is tested in document clustering problem. The experimental results show that our method achieves a good performance.

Original languageEnglish
Title of host publication2009 24th International Symposium on Computer and Information Sciences, ISCIS 2009
Pages129-134
Number of pages6
DOIs
Publication statusPublished - 2009
Event2009 24th International Symposium on Computer and Information Sciences, ISCIS 2009 - Guzelyurt, Cyprus
Duration: 14 Sept 200916 Sept 2009

Publication series

Name2009 24th International Symposium on Computer and Information Sciences, ISCIS 2009

Conference

Conference2009 24th International Symposium on Computer and Information Sciences, ISCIS 2009
Country/TerritoryCyprus
CityGuzelyurt
Period14/09/0916/09/09

Fingerprint

Dive into the research topics of 'A taxonomy based semantic similarity of documents using the cosine measure'. Together they form a unique fingerprint.

Cite this