TY - JOUR
T1 - Exploring the additional value of class imbalance distributions on interpretable flash flood susceptibility prediction in the Black Warrior River basin, Alabama, United States
AU - Ekmekcioğlu, Ömer
AU - Koc, Kerim
AU - Özger, Mehmet
AU - Işık, Zeynep
N1 - Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2022/7
Y1 - 2022/7
N2 - This study proposes a novel flash flood susceptibility prediction framework with a particular emphasis on the extent of imbalance between the number of flooding and non-flooding events as majority of the events result in non-flooding. The class imbalance issue and the magnitude of the imbalance was explored in this study to highlight the uncertain nature of the flooding phenomenon. Therefore, the Random Forest (RF) was initially adopted to evaluate five imbalance class distribution scenarios (i.e., 1x, 10x, 25x, 50x, 100x non-flood events, for each x flood event). Parameter configurations of developed models were determined with the state-of-the-art metaheuristic, the Cuckoo Search (CS) algorithm. The CS-RF model showed the highest (0.8455) prediction capability with regards to the area under the receiver operating characteristic (AUROC) once the extent of imbalance was set as 50x. The CS-RF model was then benchmarked with another bagging, i.e., Extra Trees, and two boosting, i.e., Adaptive Boosting (Adaboost) and eXtreme Gradient Boosting (XGBoost) algorithms, all integrated with the CS technique. Analysis results showed that the CS-RF is the most promising tree-based machine learning technique in flash flood susceptibility projection for the selected study area. According to the predictions, a flash flood susceptibility map was generated, where 9.35% of the basin was under very high flash flood risk. A recently developed model-agnostic game-theoretical method, SHapley Additive exPlanations (SHAP), was used for anatomizing the flash flood conditioning factors to highlight the contribution of each feature on the incident outcome prediction ensuring the transparency of the model findings. Overall, this study contributes to both theory and practice with particular focus on the model interpretability and existence of imbalance in the occurrence of flash flood events, assisting decision-makers in enhancing strategies to combat hazardous impacts of floods.
AB - This study proposes a novel flash flood susceptibility prediction framework with a particular emphasis on the extent of imbalance between the number of flooding and non-flooding events as majority of the events result in non-flooding. The class imbalance issue and the magnitude of the imbalance was explored in this study to highlight the uncertain nature of the flooding phenomenon. Therefore, the Random Forest (RF) was initially adopted to evaluate five imbalance class distribution scenarios (i.e., 1x, 10x, 25x, 50x, 100x non-flood events, for each x flood event). Parameter configurations of developed models were determined with the state-of-the-art metaheuristic, the Cuckoo Search (CS) algorithm. The CS-RF model showed the highest (0.8455) prediction capability with regards to the area under the receiver operating characteristic (AUROC) once the extent of imbalance was set as 50x. The CS-RF model was then benchmarked with another bagging, i.e., Extra Trees, and two boosting, i.e., Adaptive Boosting (Adaboost) and eXtreme Gradient Boosting (XGBoost) algorithms, all integrated with the CS technique. Analysis results showed that the CS-RF is the most promising tree-based machine learning technique in flash flood susceptibility projection for the selected study area. According to the predictions, a flash flood susceptibility map was generated, where 9.35% of the basin was under very high flash flood risk. A recently developed model-agnostic game-theoretical method, SHapley Additive exPlanations (SHAP), was used for anatomizing the flash flood conditioning factors to highlight the contribution of each feature on the incident outcome prediction ensuring the transparency of the model findings. Overall, this study contributes to both theory and practice with particular focus on the model interpretability and existence of imbalance in the occurrence of flash flood events, assisting decision-makers in enhancing strategies to combat hazardous impacts of floods.
KW - Artificial intelligence
KW - Flash flood susceptibility
KW - Flood risk management
KW - Geographic information system (GIS)
KW - Imbalance data
KW - Machine learning
KW - SHapley Additive exPlanations (SHAP)
UR - http://www.scopus.com/inward/record.url?scp=85129508388&partnerID=8YFLogxK
U2 - 10.1016/j.jhydrol.2022.127877
DO - 10.1016/j.jhydrol.2022.127877
M3 - Article
AN - SCOPUS:85129508388
SN - 0022-1694
VL - 610
JO - Journal of Hydrology
JF - Journal of Hydrology
M1 - 127877
ER -