TY - GEN
T1 - A taxonomy based semantic similarity of documents using the cosine measure
AU - Madylova, Ainura
AU - Öǧüdücü, Şule Gündüz
PY - 2009
Y1 - 2009
N2 - In this paper, we present a new method for calculating semantic similarities between documents. This method is based on cosine similarity calculation between concept vectors of documents obtained from a taxonomy of words that captures IS-A relations. The calculation of semantic similarities between documents is a very time consuming task, since it is necessary first to calculate semantic similarities between each pair of words that appear on different documents. In this paper, we present a new method to calculate semantic similarities between documents which results in faster computational time. Both a taxonomy based semantic similarity and cosine similarity are employed. First, the concept vectors of documents are obtained by extending the terms in the document vectors with their corresponding IS-A concepts. Cosine similarity is then calculated between those concept vectors of documents. Thus, the overall similarity between documents is a combination of cosine similarity and semantic similarity. The proposed semantic similarity is tested in document clustering problem. The experimental results show that our method achieves a good performance.
AB - In this paper, we present a new method for calculating semantic similarities between documents. This method is based on cosine similarity calculation between concept vectors of documents obtained from a taxonomy of words that captures IS-A relations. The calculation of semantic similarities between documents is a very time consuming task, since it is necessary first to calculate semantic similarities between each pair of words that appear on different documents. In this paper, we present a new method to calculate semantic similarities between documents which results in faster computational time. Both a taxonomy based semantic similarity and cosine similarity are employed. First, the concept vectors of documents are obtained by extending the terms in the document vectors with their corresponding IS-A concepts. Cosine similarity is then calculated between those concept vectors of documents. Thus, the overall similarity between documents is a combination of cosine similarity and semantic similarity. The proposed semantic similarity is tested in document clustering problem. The experimental results show that our method achieves a good performance.
UR - http://www.scopus.com/inward/record.url?scp=73949117196&partnerID=8YFLogxK
U2 - 10.1109/ISCIS.2009.5291865
DO - 10.1109/ISCIS.2009.5291865
M3 - Conference contribution
AN - SCOPUS:73949117196
SN - 9781424450237
T3 - 2009 24th International Symposium on Computer and Information Sciences, ISCIS 2009
SP - 129
EP - 134
BT - 2009 24th International Symposium on Computer and Information Sciences, ISCIS 2009
T2 - 2009 24th International Symposium on Computer and Information Sciences, ISCIS 2009
Y2 - 14 September 2009 through 16 September 2009
ER -