Prediction of construction accident outcomes based on an imbalanced dataset through integrated resampling techniques and machine learning methods

Kerim Koc*, Ömer Ekmekcioğlu, Asli Pelin Gurgun

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

17 Citations (Scopus)


Purpose: Central to the entire discipline of construction safety management is the concept of construction accidents. Although distinctive progress has been made in safety management applications over the last decades, construction industry still accounts for a considerable percentage of all workplace fatalities across the world. This study aims to predict occupational accident outcomes based on national data using machine learning (ML) methods coupled with several resampling strategies. Design/methodology/approach: Occupational accident dataset recorded in Turkey was collected. To deal with the class imbalance issue between the number of nonfatal and fatal accidents, the dataset was pre-processed with random under-sampling (RUS), random over-sampling (ROS) and synthetic minority over-sampling technique (SMOTE). In addition, random forest (RF), Naïve Bayes (NB), K-Nearest neighbor (KNN) and artificial neural networks (ANNs) were employed as ML methods to predict accident outcomes. Findings: The results highlighted that the RF outperformed other methods when the dataset was preprocessed with RUS. The permutation importance results obtained through the RF exhibited that the number of past accidents in the company, worker's age, material used, number of workers in the company, accident year, and time of the accident were the most significant attributes. Practical implications: The proposed framework can be used in construction sites on a monthly-basis to detect workers who have a high probability to experience fatal accidents, which can be a valuable decision-making input for safety professionals to reduce the number of fatal accidents. Social implications: Practitioners and occupational health and safety (OHS) departments of construction firms can focus on the most important attributes identified by analysis results to enhance the workers' quality of life and well-being. Originality/value: The literature on accident outcome predictions is limited in terms of dealing with imbalanced dataset through integrated resampling techniques and ML methods in the construction safety domain. A novel utilization plan was proposed and enhanced by the analysis results.

Original languageEnglish
Pages (from-to)4486-4517
Number of pages32
JournalEngineering, Construction and Architectural Management
Issue number9
Publication statusPublished - 27 Nov 2023

Bibliographical note

Publisher Copyright:
© 2022, Emerald Publishing Limited.


  • Artificial intelligence
  • Construction safety
  • Machine learning
  • Occupational accidents
  • Occupational health and safety (OHS)
  • Safety management


Dive into the research topics of 'Prediction of construction accident outcomes based on an imbalanced dataset through integrated resampling techniques and machine learning methods'. Together they form a unique fingerprint.

Cite this