TY - JOUR
T1 - Importance of data preprocessing for neural networks modeling
T2 - The case of estimating the compaction parameters of soils
AU - Isik, Fatih
AU - Ozden, Gurkan
AU - Kuntalp, Mehmet
PY - 2012/4
Y1 - 2012/4
N2 - In recent years, the artificial neural networks (ANNs) have been successfully applied to variety of engineering problems in order to discover the unknown phenomenon of the problem at hand. In the majority of these applications, ANNs were used to predict the non-linear relationship between the input variables and the corresponding target(s). Although, ANNs have undeniable advantages, they are not faultless. One of the shortcomings of ANNs takes place at the preprocessing stage of the modeling. The data preprocessing methodologies (i.e. data transformation and data division) have a significant effect on the performance of ANN models. This study examines the effect of four different data transformation methods (i.e. statistical normalization, min-max normalization, non-linear transformation and whitening transformation) and two different data division methods (i.e. random division and fuzzy c-means clustering) on ANN prediction models performances for the case study of prediction of the compaction parameters of both coarse and fine-grained soils at standard Proctor compaction energy level. Findings reveal that the raw data should be transformed by a data transformation method. It is also exposed that the main data set should be subjected to clustering analysis and divided into training, testing and validation subsets by a systematic approach. The success of preprocessing methods may vary for other neural network applications. However, this study shows the importance of data preprocessing neural networks modelers.
AB - In recent years, the artificial neural networks (ANNs) have been successfully applied to variety of engineering problems in order to discover the unknown phenomenon of the problem at hand. In the majority of these applications, ANNs were used to predict the non-linear relationship between the input variables and the corresponding target(s). Although, ANNs have undeniable advantages, they are not faultless. One of the shortcomings of ANNs takes place at the preprocessing stage of the modeling. The data preprocessing methodologies (i.e. data transformation and data division) have a significant effect on the performance of ANN models. This study examines the effect of four different data transformation methods (i.e. statistical normalization, min-max normalization, non-linear transformation and whitening transformation) and two different data division methods (i.e. random division and fuzzy c-means clustering) on ANN prediction models performances for the case study of prediction of the compaction parameters of both coarse and fine-grained soils at standard Proctor compaction energy level. Findings reveal that the raw data should be transformed by a data transformation method. It is also exposed that the main data set should be subjected to clustering analysis and divided into training, testing and validation subsets by a systematic approach. The success of preprocessing methods may vary for other neural network applications. However, this study shows the importance of data preprocessing neural networks modelers.
KW - Artificial neural networks
KW - Clustering analysis
KW - Data division
KW - Data transformation
KW - Fuzzy c-means clustering
UR - http://www.scopus.com/inward/record.url?scp=84861968599&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:84861968599
SN - 1308-772X
VL - 29
SP - 463
EP - 474
JO - Energy Education Science and Technology Part A: Energy Science and Research
JF - Energy Education Science and Technology Part A: Energy Science and Research
IS - 1
ER -