Abstract
The particulate matter in the air effects human health in a negative way. Yet, no regression model has estimated the density of PM 10 at Istanbul using datasets with imbalanced class distribution. In order to fill this gap, we designed a new regression model that transforms the regression problem into the imbalanced binary classification problem at the initial stage. In this paper, PM 10 classification problem is considered as the imbalanced binary classification problem that is coded as harmless class (1) and dangerous class (0). In the sampling part of the solution, the balanced version of the data by Under Sampling methods yielded unsatisfactory results. In the algorithmic part, the performances of RFC (Random Forest Classifier), ETC (Extra Trees Classifier) and GBC (Gradient Boosting Classifier) models, which stand out with their positive effects on unbalanced learning problems, are compared in terms of AUROC. The proposed model, uses all training set samples and predicts through RFC. The experimental results on real world dataset seem quite promising for our further research.
Original language | English |
---|---|
Title of host publication | UBMK 2018 - 3rd International Conference on Computer Science and Engineering |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 361-366 |
Number of pages | 6 |
ISBN (Electronic) | 9781538678930 |
DOIs | |
Publication status | Published - 6 Dec 2018 |
Event | 3rd International Conference on Computer Science and Engineering, UBMK 2018 - Sarajevo, Bosnia and Herzegovina Duration: 20 Sept 2018 → 23 Sept 2018 |
Publication series
Name | UBMK 2018 - 3rd International Conference on Computer Science and Engineering |
---|
Conference
Conference | 3rd International Conference on Computer Science and Engineering, UBMK 2018 |
---|---|
Country/Territory | Bosnia and Herzegovina |
City | Sarajevo |
Period | 20/09/18 → 23/09/18 |
Bibliographical note
Publisher Copyright:© 2018 IEEE.
Keywords
- air pollution
- binary classification
- Data mining
- ensemble methods
- imbalanced learning