Abstract
There is increasing interest in global dynamic soil information with changes in soil properties mapped over time and at high spatial resolution. Thanks to long-term, multi-temporal, and fine- and medium-resolution satellite missions such as Landsat, MODIS, Copernicus Sentinel and similar, it is possible to produce globally consistent predictions of key soil variables that match other 10–30 m spatial resolution global data sets. This paper describes data preparation, modeling, and production of OpenLandMap-soildb: global dynamic predictions of soil organic carbon content, soil organic carbon density, bulk density, soil pH in H2O, soil texture fractions (clay, sand and silt) and USDA subgroup soil types (USDA soil taxonomy subgroups) at 30 m spatial resolution based on spatiotemporal Machine Learning (Quantile Regression Random Forest with output predictions showing the mean plus the 68 % probability lower and upper prediction intervals). To train the models, a large compilation of soil samples imported from legacy soil projects was used: 216 000 soil samples with soil carbon density (kg m−3), 408 000 soil samples with soil carbon content (g kg−1), 272 000 soil samples with soil pH in H2O, 363 000 soil samples with clay, silt and sand content (%) and 134 000 samples with bulk density oven dry (t m−3). Soil carbon and soil pH were mapped with 5-year time-intervals; soil texture fractions, bulk density, and soil types were mapped for recent years only. The cross-validation results indicate Root Mean Square Error (RMSE) of 17.7 (kg m−3; 0.486 in log-scale) and Concordance Correlation Coefficient (CCC) of 0.88 for SOC density, RMSE of 51.3 (g kg−1; 0.574 in log-scale) and CCC of 0.87 for SOC content, RMSE of 0.15 (t m−3) and CCC of 0.92 for bulk density of fine-earth, RMSE of 0.51 and CCC of 0.91 for soil pH, RMSE of 8.4 % and CCC of 0.87 for soil clay content, and RMSE of 12.6 % and CCC of 0.84 for soil sand content respectively. The most important variables for predicting soil organic carbon density (kg m−3) were: soil depth, Landsat-based uncalibrated Gross Primary Productivity (GPP), Normalized Difference Vegetation Index (NDVI) and CHELSA bioclimatic indices. The global distribution of soil pH can be primarily explained by the CHELSA Aridity Index (long-term), annual precipitation, and salinity grade. The global stocks for 2020–2022+ period for 0–30 cm depth interval are estimated at 461 Pg (Peta grams); the results further indicate that, in the last 25 years, the world has lost at least 11 Pg of SOC in the top soil. Suggestions are made on how to set up global permanent monitoring stations to accurately track land degradation and enable land restoration projects. The training data set is available at https://doi.org/10.5281/zenodo.4748499 (Hengl and Gupta, 2025), while the resulting data products can be accessed at https://doi.org/10.5281/zenodo.15470431 (Consoli et al., 2025) and https://world.soils.app (OpenGeoHub Foundation, 2026). Both datasets are released under a CC-BY license.
| Original language | English |
|---|---|
| Pages (from-to) | 989-1036 |
| Number of pages | 48 |
| Journal | Earth System Science Data |
| Volume | 18 |
| Issue number | 2 |
| DOIs | |
| Publication status | Published - 6 Feb 2026 |
| Externally published | Yes |
Bibliographical note
Publisher Copyright:© Author(s) 2026.
Fingerprint
Dive into the research topics of 'OpenLandMap-soildb: global soil information at 30 m spatial resolution for 2000–2022+ based on spatiotemporal Machine Learning and harmonized legacy soil samples and observations'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver