Abstract
This study develops a season-aware machine-learning (ML) framework to predict hourly concentrations of PM10, PM2.5 and O3 across İstanbul. A comprehensive 2021–2023 dataset was compiled from three co-located air-quality and meteorological monitoring stations that typify contrasting source regimes, i.e., a traffic-dominated urban site, a rural background site, and a semi-urban coastal site. Seven ML algorithms, namely eXtreme Gradient Boosting (XGBoost), Extra Trees (ETR), Random Forest (RF), Adaptive Boosting (AdaBoost), Multi-Layer Perceptron (MLP), k-Nearest Neighbors (KNN) and Support Vector Regression (SVR), were utilized to establish a holistic comparison scheme. Hyperparameters were optimized using five-fold cross-validated Bayesian search, and models were evaluated with various performance indicators on season-withheld test sets. In the winter months, ETR achieved a mean R2 = 0.93 (RMSE ≈ 10 µg/m3) for PM10 at Bağcılar, while XGBoost yielded R2 = 0.88 for O3 at the same site. Summer predictions were more challenging. PM10 skill in rural Arnavutköy dropped to R2 = 0.61 despite strong training fits, highlighting over-fitting risks under complex, non-stationary chemical conditions. By contrast, MLP maintained robust urban performance for PM2.5 (summer test R2 = 0.80) and KNN provided the most stable O3 prediction in rural areas (R2 = 0.74). To enhance interpretability, SHAP (SHapley Additive exPlanations) analysis was applied to the best-performing models, enabling a transparent assessment of how meteorological and co-pollutant inputs shaped predictions at each site. The proposed framework demonstrates that data-driven models can complement traditional air-quality modeling systems by providing station-level insights and interpretable relationships between pollutants and meteorological drivers, supporting air-quality assessment and policy-relevant analyses in rapidly urbanizing regions.
| Original language | English |
|---|---|
| Article number | 37 |
| Journal | Stochastic Environmental Research and Risk Assessment |
| Volume | 40 |
| Issue number | 2 |
| DOIs | |
| Publication status | Published - Feb 2026 |
Bibliographical note
Publisher Copyright:© The Author(s) 2026.
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 11 Sustainable Cities and Communities
-
SDG 14 Life Below Water
Keywords
- Machine learning
- O
- PM
- PM
- SHAP
- İstanbul
Fingerprint
Dive into the research topics of 'Interpretable machine learning framework for air quality prediction in Istanbul using Shapley additive explanations (SHAP)'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver