Abstract
The proliferation of online forums and communities has greatly facilitated knowledge sharing and user support but has also introduced the significant challenge of managing redundant and semantically similar questions. Traditional keyword-based methods have proven inadequate in addressing this issue due to the inherent complexities of natural language, where the same idea can be expressed in numerous ways. This study investigates the use of advanced machine learning algorithms - Logistic Regression, Random Forest, and Gradient Boosting (XGBoost) - to detect semantically similar questions. By employing the Quora Question Pairs dataset, the performance of these models is evaluated using metrics such as accuracy, precision, recall, and F1-score. This research not only provides a comparative analysis of these machine learning models but also suggests a framework for improving information retrieval and user experience in online forums. The study highlights the potential for future integration of deep learning models and advanced semantic understanding techniques to further enhance the detection of semantically similar questions.
Original language | English |
---|---|
Title of host publication | Proceedings - 2024 IEEE 16th International Conference on Communication Systems and Network Technologies, CICN 2024 |
Editors | Geetam Singh Tomar |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 290-297 |
Number of pages | 8 |
ISBN (Electronic) | 9798331505264 |
DOIs | |
Publication status | Published - 2024 |
Event | 16th IEEE International Conference on Computational Intelligence and Communication Networks, CICN 2024 - Indore, India Duration: 22 Dec 2024 → 23 Dec 2024 |
Publication series
Name | Proceedings - 2024 IEEE 16th International Conference on Communication Systems and Network Technologies, CICN 2024 |
---|
Conference
Conference | 16th IEEE International Conference on Computational Intelligence and Communication Networks, CICN 2024 |
---|---|
Country/Territory | India |
City | Indore |
Period | 22/12/24 → 23/12/24 |
Bibliographical note
Publisher Copyright:© 2024 IEEE.
Keywords
- Machine Learning
- Natural Language Processing
- Sentiment Analysis
- Word Embeddings