TY - JOUR
T1 - Generating synthetic data with variational autoencoder to address class imbalance of graph attention network prediction model for construction management
AU - Mostofi, Fatemeh
AU - Behzat Tokdemir, Onur
AU - Toğan, Vedat
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2024/10
Y1 - 2024/10
N2 - The predictive performance of machine learning (ML) models is challenged when trained on class imbalance real-world construction datasets, reducing the accuracy of relevant decisions. In construction projects, the collection of a balanced dataset is not always feasible. Here, the integration of generative and prediction models holds potential, synthesizing the underrepresented class and configuring a balanced input dataset. This study improves the performance of construction prediction models through the integration of a generative model that augments the dataset for the underrepresented class. For this, a variational autoencoder (VAE) was integrated into a multi-head graph attention network (GAT), whereby a comprehensive construction productivity dataset was collected across different projects related to different construction activities, each with a particular structure and level of class imbalance. Balancing the class distribution led to a significant increase in the predictive performance of the GAT model, where accuracy jumped from 90.6 % to 92.5 %, 81.1 % to 94.4 %, and 92.2 % to 95.4 % when trained on finishing, concrete, and insulation activity networks, respectively.
AB - The predictive performance of machine learning (ML) models is challenged when trained on class imbalance real-world construction datasets, reducing the accuracy of relevant decisions. In construction projects, the collection of a balanced dataset is not always feasible. Here, the integration of generative and prediction models holds potential, synthesizing the underrepresented class and configuring a balanced input dataset. This study improves the performance of construction prediction models through the integration of a generative model that augments the dataset for the underrepresented class. For this, a variational autoencoder (VAE) was integrated into a multi-head graph attention network (GAT), whereby a comprehensive construction productivity dataset was collected across different projects related to different construction activities, each with a particular structure and level of class imbalance. Balancing the class distribution led to a significant increase in the predictive performance of the GAT model, where accuracy jumped from 90.6 % to 92.5 %, 81.1 % to 94.4 %, and 92.2 % to 95.4 % when trained on finishing, concrete, and insulation activity networks, respectively.
KW - Class imbalance
KW - Construction productivity prediction
KW - Data augmentation
KW - Generative model
KW - Graph attention network (GAT)
KW - Machine learning (ML)
KW - Variational autoencoder (VAE)
UR - http://www.scopus.com/inward/record.url?scp=85193836048&partnerID=8YFLogxK
U2 - 10.1016/j.aei.2024.102606
DO - 10.1016/j.aei.2024.102606
M3 - Article
AN - SCOPUS:85193836048
SN - 1474-0346
VL - 62
JO - Advanced Engineering Informatics
JF - Advanced Engineering Informatics
M1 - 102606
ER -