Abstract
Categorization of web sites is an important problem and has many practical applications. One such application is parental control for safe internet for children. Failure to classify websites by specific rules makes it difficult to access information, as well as leaving many users of different age groups with the harmful side of the Internet. Current secure internet solutions are not comprehensive or cannot be customized. Furthermore, the fact that the blocking orders issued by the courts do not cover all harmful sites and these websites change their domains so often. Thus, dynamic classification of websites using the text data is very important. In this study, using natural language processing and machine learning techniques websites are classified. Content of web sites from various languages are collected and preprocessed before applying machine learning techniques. In the study, 17 classes were used, the highest classification success was 0.8756 and this result was reached by the SVM method.
Translated title of the contribution | Machine Learning for Web Content Classification |
---|---|
Original language | Turkish |
Title of host publication | Proceedings - 2020 Innovations in Intelligent Systems and Applications Conference, ASYU 2020 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781728191362 |
DOIs | |
Publication status | Published - 15 Oct 2020 |
Externally published | Yes |
Event | 2020 Innovations in Intelligent Systems and Applications Conference, ASYU 2020 - Istanbul, Turkey Duration: 15 Oct 2020 → 17 Oct 2020 |
Publication series
Name | Proceedings - 2020 Innovations in Intelligent Systems and Applications Conference, ASYU 2020 |
---|
Conference
Conference | 2020 Innovations in Intelligent Systems and Applications Conference, ASYU 2020 |
---|---|
Country/Territory | Turkey |
City | Istanbul |
Period | 15/10/20 → 17/10/20 |
Bibliographical note
Publisher Copyright:© 2020 IEEE.