TY - JOUR
T1 - An automated approach for developing geohazard inventories using news
T2 - integrating natural language processing (NLP), machine learning, and mapping
AU - Avcloglu, Aydogan
AU - Demir, Ogün
AU - Görüm, Tolga
N1 - Publisher Copyright:
© 2025 Aydoğan Avcloğlu et al.
PY - 2025/7/21
Y1 - 2025/7/21
N2 - Spatiotemporal inventories of geohazards are essential for comprehending the building of resilient societies; yet, restricted access to global inventories hinders the advancement of mitigation strategies. Consequently, we developed an approach that enhances the potential of using online newspapers in the creation of geohazard inventories by utilizing web scraping, natural language processing (NLP), clustering, and geolocation of textual data. Here, we use online newspapers from 1997-2023 in Türkiye to employ our approach. In the first stage, we retrieved 15 569 news articles by using our tr-news-scraper tool, considering wildfire-, flood-, landslide-, and sinkhole-related geohazard news. Further, we utilized NLP preprocessing approaches to refine the raw texts obtained from newspaper sources, which were subsequently clustered into four geohazard groups, resulting in 3928 news articles. In the final stage of the approach, we developed a method that geolocates the news using the OpenStreetMap (OSM) Nominatim tool, ending up with a total of 13 940 geohazard incidents derived from news comprising multiple incidents across various locations. As a result, we mapped 9609 floods, 1834 wildfires, 1843 landslides, and 654 sinkhole formation incidents from online newspaper sources, showing a spatiotemporally consistent distribution with the existing literature. Consequently, we illustrated the potential of using online newspaper articles in the development of geohazard inventories with our approach, which draws text data from web sources to generate maps by leveraging the capabilities of web scraping, NLP, and mapping techniques.
AB - Spatiotemporal inventories of geohazards are essential for comprehending the building of resilient societies; yet, restricted access to global inventories hinders the advancement of mitigation strategies. Consequently, we developed an approach that enhances the potential of using online newspapers in the creation of geohazard inventories by utilizing web scraping, natural language processing (NLP), clustering, and geolocation of textual data. Here, we use online newspapers from 1997-2023 in Türkiye to employ our approach. In the first stage, we retrieved 15 569 news articles by using our tr-news-scraper tool, considering wildfire-, flood-, landslide-, and sinkhole-related geohazard news. Further, we utilized NLP preprocessing approaches to refine the raw texts obtained from newspaper sources, which were subsequently clustered into four geohazard groups, resulting in 3928 news articles. In the final stage of the approach, we developed a method that geolocates the news using the OpenStreetMap (OSM) Nominatim tool, ending up with a total of 13 940 geohazard incidents derived from news comprising multiple incidents across various locations. As a result, we mapped 9609 floods, 1834 wildfires, 1843 landslides, and 654 sinkhole formation incidents from online newspaper sources, showing a spatiotemporally consistent distribution with the existing literature. Consequently, we illustrated the potential of using online newspaper articles in the development of geohazard inventories with our approach, which draws text data from web sources to generate maps by leveraging the capabilities of web scraping, NLP, and mapping techniques.
UR - https://www.scopus.com/pages/publications/105017289175
U2 - 10.5194/nhess-25-2421-2025
DO - 10.5194/nhess-25-2421-2025
M3 - Article
AN - SCOPUS:105017289175
SN - 1561-8633
VL - 25
SP - 2421
EP - 2435
JO - Natural Hazards and Earth System Sciences
JF - Natural Hazards and Earth System Sciences
IS - 7
ER -