TY - JOUR
T1 - Precipitation downscaling with the integration of multiple precipitation products, land surface data and gauge stations using explainable machine learning algorithms
T2 - A case study in the Mediterranean region of Turkiye
AU - Hisam, Enes
AU - Sertel, Elif
AU - Seker, Dursun Zafer
N1 - Publisher Copyright:
© 2025 Elsevier B.V.
PY - 2025/11/1
Y1 - 2025/11/1
N2 - Globally, gridded precipitation products are employed in various applications, from climate research and drought monitoring to water resource management. Despite offering broad spatial coverage, many available datasets have coarse spatial resolution (0.1°–0.25°), which limits their effectiveness for local and regional applications. This study aims to downscale monthly gridded precipitation data to a high spatial resolution (0.04°) over the Mediterranean region of Türkiye by integrating ground-based, satellite-based, and reanalysis precipitation datasets with land surface characteristics (topography, NDVI, land surface temperature (LST), and distance from the sea). A rule-based algorithm, Cubist, and four machine learning algorithms Random Forest, XGBoost, LightGBM, and CatBoost were trained and validated using monthly precipitation data from 193 meteorological stations (2017–2021). We conducted various experiments by generating different combinations of datasets. Comb1 includes eight gridded precipitation products (PERSIANN-CCS, PERSIANN-CDR, PDIR-Now, CHIRPS, GSMaP MVK v7, GSMaP Gauge v7, IMERG v6, ERA5), Comb2 consists of products with long-term observations (∼40 years), and Comb3 contains products with real-time data (∼1 h latency). Additional experiments incorporated land surface characteristics into each combination. The monthly precipitation maps generated as models' output in 2016, an independent year, were compared with meteorological stations-based precipitation data, demonstrating robust statistical and visual agreement. Comb1 and Comb2 consistently outperformed their individual components, achieving high agreement with observed data (PCC > 0.79, RMSE < 39 mm, MAE < 24 mm), while Comb3 showed no added benefit compared to its components, as the performance was approximately the same and the resolution was similar to its components. SHapley Additive exPlanations (SHAP) were used to interpret model predictions. IMERG and ERA5 emerged as the most influential gridded precipitation inputs across all models. Among land surface features, elevation and LST were generally the most impactful, whereas NDVI showed minimal influence.
AB - Globally, gridded precipitation products are employed in various applications, from climate research and drought monitoring to water resource management. Despite offering broad spatial coverage, many available datasets have coarse spatial resolution (0.1°–0.25°), which limits their effectiveness for local and regional applications. This study aims to downscale monthly gridded precipitation data to a high spatial resolution (0.04°) over the Mediterranean region of Türkiye by integrating ground-based, satellite-based, and reanalysis precipitation datasets with land surface characteristics (topography, NDVI, land surface temperature (LST), and distance from the sea). A rule-based algorithm, Cubist, and four machine learning algorithms Random Forest, XGBoost, LightGBM, and CatBoost were trained and validated using monthly precipitation data from 193 meteorological stations (2017–2021). We conducted various experiments by generating different combinations of datasets. Comb1 includes eight gridded precipitation products (PERSIANN-CCS, PERSIANN-CDR, PDIR-Now, CHIRPS, GSMaP MVK v7, GSMaP Gauge v7, IMERG v6, ERA5), Comb2 consists of products with long-term observations (∼40 years), and Comb3 contains products with real-time data (∼1 h latency). Additional experiments incorporated land surface characteristics into each combination. The monthly precipitation maps generated as models' output in 2016, an independent year, were compared with meteorological stations-based precipitation data, demonstrating robust statistical and visual agreement. Comb1 and Comb2 consistently outperformed their individual components, achieving high agreement with observed data (PCC > 0.79, RMSE < 39 mm, MAE < 24 mm), while Comb3 showed no added benefit compared to its components, as the performance was approximately the same and the resolution was similar to its components. SHapley Additive exPlanations (SHAP) were used to interpret model predictions. IMERG and ERA5 emerged as the most influential gridded precipitation inputs across all models. Among land surface features, elevation and LST were generally the most impactful, whereas NDVI showed minimal influence.
KW - Cubist
KW - Decision tree
KW - Explainable machine learning
KW - Gridded precipitation
KW - Random Forest
KW - Reanalysis
KW - Satellite-based precipitation
UR - https://www.scopus.com/pages/publications/105016882215
U2 - 10.1016/j.scitotenv.2025.180540
DO - 10.1016/j.scitotenv.2025.180540
M3 - Article
AN - SCOPUS:105016882215
SN - 0048-9697
VL - 1002
JO - Science of the Total Environment
JF - Science of the Total Environment
M1 - 180540
ER -