Generating landslide archive inventories for Türkiye using web scraping and natural language processing techniques

  • Elnaz Najatishendi*
  • , Tolga Görüm
  • , Seçkin Fidan
  • , Fusun Balık Şanlı
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Landslides are among the most frequent natural hazards that cause significant loss of life and serious economic damage worldwide. Although many inventories have been created using different approaches to understand landslide events, these inventories are rarely updated automatically or in real time. Traditional approaches are time-consuming and labor-intensive and are often limited in timeliness because of reporting delays. To address these challenges, we developed an automated approach that integrates web scraping, natural language processing (NLP), and geocoding techniques using digital media news sources in Türkiye to create a landslide archive inventory. Our algorithm verified 1727 of the 3051 news articles it captured between 1997 and 2024 as landslides and identified a total of 478 fatalities in 212 deadly incidents. A total of 66.5% of the landslides captured on the web were located at the neighborhood/village level, providing substantial spatial accuracy. This location accuracy also enabled risk estimation at the neighborhood/village level. A comparison with the manual national inventory revealed moderate agreement, with F1 scores ranging from 0.434 to 0.552 in the ± 1 and ± 7 daytime windows, respectively. The automated method not only captures spatial and temporal patterns of landslides but also extracts key attributes such as location, number of fatalities, and triggering factors (i.e., natural and anthropogenic). Our study demonstrates the potential of web-based automated approaches to complement traditional landslide inventories by providing near-real-time and verifiable data. Finally, we suggest adopting common reporting standards for natural hazard digital newspapers so that this approach can be applied globally.

Original languageEnglish
Article number27
JournalNatural Hazards
Volume122
Issue number1
DOIs
Publication statusPublished - Jan 2026

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive licence to Springer Nature B.V. 2025.

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 11 - Sustainable Cities and Communities
    SDG 11 Sustainable Cities and Communities

Keywords

  • Geocoding
  • Landslide inventory
  • Landslides
  • Natural language processing
  • Web scraping

Fingerprint

Dive into the research topics of 'Generating landslide archive inventories for Türkiye using web scraping and natural language processing techniques'. Together they form a unique fingerprint.

Cite this