Precipitation downscaling with the integration of multiple precipitation products, land surface data and gauge stations using explainable machine learning algorithms: A case study in the Mediterranean region of Turkiye

Enes Hisam, Elif Sertel*, Dursun Zafer Seker

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Globally, gridded precipitation products are employed in various applications, from climate research and drought monitoring to water resource management. Despite offering broad spatial coverage, many available datasets have coarse spatial resolution (0.1°–0.25°), which limits their effectiveness for local and regional applications. This study aims to downscale monthly gridded precipitation data to a high spatial resolution (0.04°) over the Mediterranean region of Türkiye by integrating ground-based, satellite-based, and reanalysis precipitation datasets with land surface characteristics (topography, NDVI, land surface temperature (LST), and distance from the sea). A rule-based algorithm, Cubist, and four machine learning algorithms Random Forest, XGBoost, LightGBM, and CatBoost were trained and validated using monthly precipitation data from 193 meteorological stations (2017–2021). We conducted various experiments by generating different combinations of datasets. Comb1 includes eight gridded precipitation products (PERSIANN-CCS, PERSIANN-CDR, PDIR-Now, CHIRPS, GSMaP MVK v7, GSMaP Gauge v7, IMERG v6, ERA5), Comb2 consists of products with long-term observations (∼40 years), and Comb3 contains products with real-time data (∼1 h latency). Additional experiments incorporated land surface characteristics into each combination. The monthly precipitation maps generated as models' output in 2016, an independent year, were compared with meteorological stations-based precipitation data, demonstrating robust statistical and visual agreement. Comb1 and Comb2 consistently outperformed their individual components, achieving high agreement with observed data (PCC > 0.79, RMSE < 39 mm, MAE < 24 mm), while Comb3 showed no added benefit compared to its components, as the performance was approximately the same and the resolution was similar to its components. SHapley Additive exPlanations (SHAP) were used to interpret model predictions. IMERG and ERA5 emerged as the most influential gridded precipitation inputs across all models. Among land surface features, elevation and LST were generally the most impactful, whereas NDVI showed minimal influence.

Original languageEnglish
Article number180540
JournalScience of the Total Environment
Volume1002
DOIs
Publication statusPublished - 1 Nov 2025

Bibliographical note

Publisher Copyright:
© 2025 Elsevier B.V.

Keywords

  • Cubist
  • Decision tree
  • Explainable machine learning
  • Gridded precipitation
  • Random Forest
  • Reanalysis
  • Satellite-based precipitation

Fingerprint

Dive into the research topics of 'Precipitation downscaling with the integration of multiple precipitation products, land surface data and gauge stations using explainable machine learning algorithms: A case study in the Mediterranean region of Turkiye'. Together they form a unique fingerprint.

Cite this