An automated approach for developing geohazard inventories using news: integrating natural language processing (NLP), machine learning, and mapping

Aydogan Avcloglu*, Ogün Demir, Tolga Görüm

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Spatiotemporal inventories of geohazards are essential for comprehending the building of resilient societies; yet, restricted access to global inventories hinders the advancement of mitigation strategies. Consequently, we developed an approach that enhances the potential of using online newspapers in the creation of geohazard inventories by utilizing web scraping, natural language processing (NLP), clustering, and geolocation of textual data. Here, we use online newspapers from 1997-2023 in Türkiye to employ our approach. In the first stage, we retrieved 15 569 news articles by using our tr-news-scraper tool, considering wildfire-, flood-, landslide-, and sinkhole-related geohazard news. Further, we utilized NLP preprocessing approaches to refine the raw texts obtained from newspaper sources, which were subsequently clustered into four geohazard groups, resulting in 3928 news articles. In the final stage of the approach, we developed a method that geolocates the news using the OpenStreetMap (OSM) Nominatim tool, ending up with a total of 13 940 geohazard incidents derived from news comprising multiple incidents across various locations. As a result, we mapped 9609 floods, 1834 wildfires, 1843 landslides, and 654 sinkhole formation incidents from online newspaper sources, showing a spatiotemporally consistent distribution with the existing literature. Consequently, we illustrated the potential of using online newspaper articles in the development of geohazard inventories with our approach, which draws text data from web sources to generate maps by leveraging the capabilities of web scraping, NLP, and mapping techniques.

Original languageEnglish
Pages (from-to)2421-2435
Number of pages15
JournalNatural Hazards and Earth System Sciences
Volume25
Issue number7
DOIs
Publication statusPublished - 21 Jul 2025

Bibliographical note

Publisher Copyright:
© 2025 Aydoğan Avcloğlu et al.

Fingerprint

Dive into the research topics of 'An automated approach for developing geohazard inventories using news: integrating natural language processing (NLP), machine learning, and mapping'. Together they form a unique fingerprint.

Cite this