Metaheuristic-Based Machine Learning System for Prediction of Compressive Strength based on Concrete Mixture Properties and Early-Age Strength Test Results

Estimating the accurate concrete strength has become a critical issue in civil engineering. The 28-day concrete cylinder test results depict the concrete's characteristic strength which was prepared and cast as part of the concrete work on the project. Waiting 28 days is important to guarantee the quality control of the procedure, even though it is a slow process. This research develops an advanced machine learning method to forecast the concrete compressive strength using the concrete mix proportion and early-age strength test results. Thirty-eight historical cases in total were used to create the intelligence prediction method. The results obtained indicate the effectiveness of the advanced hybrid machine learning strategy in forecasting the strength of the concrete with a comparatively high degree of accuracy calculated using 4 error indicators. As a result, the suggested study can provide a great advantage for construction project managers in decision-making procedures that depend on early strength results of the tests.


Introduction
In the construction field, the strength of concrete is a significant criterion when selecting a type of concrete to be used for a specific purpose.Construction concrete will increase in strength for an extended time span once it has been poured.The nominal strength of concrete is defined by the compressive strength of the sample at 28 days.If there was a defective mix preparation or mix design on location, the test results might show the required strength was not reached, resulting in a mandatory repeating of the entire procedure, which could be a slow and expensive process.
Any failures would require waiting another 28 days for the new test results, therefore, there was a great need for the ability to determine the final strength of concrete at an early-age.Consequently, a suitable, fast procedure for predicting the strength of concrete would definitely be a major advancement in the construction industry [1].The capability to forecast the early compressive concrete strength would enable contractors to quickly recognize the concrete"s possible weakness, thereby planning to deal with a destruction procedure or to keep going with the construction.
Additionally, to the advantage of both the user and the manufacturer, reliably and quickly forecasting the outcomes of a 28-day test would be advantageous to all stakeholders involved rather than waiting the entire, traditional, 28 days.
Concrete behavior, including forecasting the strength of concrete has been an area of interest for researchers and is gaining even more interests today.There have been several recent scientific studies examining the behavior of concrete and the potential for improving the estimations of characteristic strength.The research has shown that numerous tests have focused on how concrete strength is impacted by the mix, however, just a handful of studies has concentrated on the connection between early testing and the total 28day compressive strength test.Furthermore, the majority of the scientific studies has certain constraints; such as, a lack of advanced technique for calculating accuracy, no validating methods involved, or working with only a traditional approach.
Machine learning techniques have been shown to exceed traditional techniques due to their excellent learning capabilities [2][3][4][5][6].The traditional methods, like linear regression and decision tree, are not good enough to develop a suitable model with regards to accuracy and computation time.The least squares support vector regression (LSSVR) technique has evolved into an excellent machine learning approach and has been extensively applied in numerous fields because of its advantages.For example, Cheng and Prayogo applied an enhanced LSSVR technique to predict the permanent deformation behavior of asphalt mixtures [6].Hoang et al. developed an optimized LSSVR model for groutability estimation of grouting process [7].Notably, the hyperparameters of the LSSVR technique must be fine-tuned to enhance prediction accuracy.
This study develops a novel hybrid machine learning technique called symbiotic organisms search-least squares support vector regression (SOS-LSSVR).The main objective of this research is to develop an advanced, accurate method to improve early-age estimates of concrete strength.The SOS-LSSVR approach combines an accurate prediction method, least squares support vector regression (LSSVR), as well as a powerful and new metaheuristic, symbiotic organisms search (SOS).The proposed model will be examined together with various prediction techniques to establish the prediction model of concrete strength with the early-age strength test results.

Least Squares Support Vector Regression
The Basic Model LS-SVR was first introduced by Suykens and Vandewalle [8] as a modification of the conventional support vector regression (SVR).The following model of interest underlies the functional relationship between one or more independent variables by using a regression function as states as follows: is the mapping to the high dimensional feature space.In LS-SVR, given a training data set  , the optimization problem is formulated as follows: where R e k  are error variables; 0   denotes a regularization constant.
In the previous optimization problem, a regularizetion term and a sum of squared fitting errors make for the objective function.The Lagrangian is given by: where k  are Lagrange multipliers.The conditions for optimality are given by: The resulting LS-SVR model for function estimation is expressed as: where k  and b are the solution to the linear system.
The kernel function that is often utilized is a radial based function (RBF) kernel; a description is given as: where  is the kernel function parameter.

Model Selection in the Training Phase
Initially, the prediction model of the LSSVR is trained on a training set, which is a set of samples adapted to fit the parameters of the model.The test data set is employed to give an impartial assessment of the trained model.It is a popular notion that the generalization performance of an LSSVR trained model relies on fine-tuning of the hyper-parameters (and) referred to as "model selection".In order to guarantee optimal performance of the prediction model, the two tuning hyper-parameters have to be set correctly.
The LSSVR parameter selection is commonly referred to as the model selection problem and could be put together as an optimization problem.One common approach to help with the model selection is to use an extensive grid search over the parameter domain.Clearly, the parameter domain for grid search needs to involve a large search space to cover the global optimum.This forces the grid search to have expensive computational cost, particularly on the large-scale training set.Consequently, a more advanced system is required to identify the best combination of LSSVR parameters.
Research has shown that there is an increase in prediction accuracy when metaheuristic algorithm is used as the optimizer for model selection and has improved prediction performance in various engineering problems [2,6,9,10].Many studies have shown that the SOS algorithm is better when compared to other metaheuristic algorithms in finding the optimal solutions to problems involving complex and nonlinear optimization [2,11].Additionally, a past study has successfully utilized the SOS algorithm as a self-automatic tuning framework in optimizing hyper-parameters of machine learning techniques and has produced a better accuracy when compared to a variety of hybrid techniques [12].Therefore, this study uses SOS to enhance the prediction accuracy of LSSVR.

The Proposed Metaheuristic-based Machine Learning System
The search for optimality proves to be a demanding task in many optimization applications.Optimum solutions to problems have taken place in nature via evolution, where the process of natural selection has removed most inferior solutions.Past studies revealed that metaheuristic algorithms inspired by nature like SOS are effective in finding solutions to complicated optimization problems.Therefore, this study utilizes SOS to fine-tune the LSSVR hyper-parameters, as well as to guarantee excellent prediction accuracy.

Symbiotic Organisms Search
The SOS algorithm was proposed by Cheng and Prayogo and has become one of the most popular metaheuristic algorithms in use today [2,13].It is derived from natural organisms that usually have symbiosis-dependency-based relationships.Similar to other typical metaheuristic algorithms, the SOS algorithm leads to the optimization process of the candidate solutions through special search operators.
To begin with, an ecosystem matrix (population) is randomly generated by the SOS.The ecosystem size denotes the number of organisms that can be put into the ecosystem.Every matrix row corresponds to virtual organisms which represents the different candidate solutions.Each virtual organism has to be related to the objective value of the current problem.The search begins following the initial generation of the random ecosystem.The searching process contains three phases, during which the organisms take advantage of the interaction (mutualism, commensalism, and parasitism).The objective value of the updated virtual organism has to be improved so that the pre-interaction virtual organisms can be replaced.In every iteration, once all phases are done, the best virtual organisms can be updated.To conclude, the phase cycle continuously until the termination criterion has been satisfied.
SOS uses the three rules of symbioses: (1) mutualism symbiosis, which refers to the reciprocated advantages associated with two living organisms; (2) commensalism symbiosis, which one organism takes all the benefits from the other, while this other organism not substantially affected by this interaction; (3) parasitism symbiosis, which the advantage that an organism obtains from the other are a disadvantage to this other organism.The mathematical model adaptation for these symbioses is explained in the following subsections.

Mutualism Phase
The relationship in the mutualism phase is characterized by the benefits of both sides.One such case is the relationship between bees and flowers.The following is the mathematical formulation of this phase: Here currentSoli and currentSolj are two current virtual organisms involved in mutualism; bestSol is the current best virtual organism; rand(0,1) represents the uniform random value between 0 to 1; mutualSolij models the mutualism interaction of current virtual organisms; newSoli and newSolj are the updated virtual organisms following the interaction; BF1 and BF2 represent two random values of either 0 or 1 illustrating the level of benefit each virtual organism has.The following formulation is used to calculate mutualSolij.

Commensalism Phase
In the commensalism phase, one virtual organism establishes a relationship in which it is the sole beneficiary, such as, a relationship between sharks and remora fish.The following is a mathematical formulation for this phase: where rand(-1,1) represents the uniform random value between -1 to 1.

Parasitism Phase
The relationship in the parasitism phase is denoted by being harmful to one side and beneficial for the other.To illustrate, the plasmodium parasite uses the anopheles mosquito to transfer itself from one human to another.The harmed side of this relation-ship will probably perish, whereas the beneficiary will become fitter.The following is a mathematical formulation for this phase.
Here parasiteSoli is the artificial parasite engaged with currentSoli, and it threatens the existence of currentSolj; F and (1-F) are the binary random matrix and its inverse, respectively; ub and lb are the upper and lower bound of the searching area.To eliminate the randomness in partitioning the training set, a k-fold cross-validation technique is proposed in this study [21].During this procedure, kfold cross-validation creates non-overlapping k subsets from training set.Since k is a variable parameter, any adequate number will work.This study sets the value of k to be 5.Hence, the data is split up into five random equal size groups, with 4 subsets are employed as training subsets and one as a validation subset.A total of (k-1) subsets are employed for training the model, while the remaining k-th subset is employed for the validating the training process.Since the procedure is based on cross-validation, it is repeated k times to ensure that every subset is used at least once as the validation subset.

Integrating the Metaheuristic in Model Selection for LSSVR
In this step, a hybrid system is proposed so called the symbiotic organisms search-least squares support vector regression (SOS-LSSVR) that combines the two different techniques of SOS and LSSVR.As mentioned earlier, the LSSVR acts as a predictor to build the accurate input-output relationship of the data set; and the SOS works to optimize the LSSVR procedure of SOS-LSSVR is shown in Figure 1.

Figure 1. SOS-LSSVR Framework
The SOS was allowed to identify the optimal LSSVR parameters, and accordingly the predictive model sets were constructed in the training process.As mentioned previously, the training data set consists of two parts, the "training subset" and the "validation subset".The objective of this division is to avoid the chance of overfitting throughout the training procedure.The study chose the k-fold cross-validation process to avoid bias throughout the sample partitioning procedure.A supervised learning procedure of the LSSVR is utilized to train the model on the training subset to find the best fit of the  and  of the LSSVR hyper-parameter of the model.
Afterward, the fitted model is used to predict the target output from the validation subset.It is worth noting that the validation subset offers an unbiased assessment of a model fit on the training subset, and it also tunes the hyper-parameters of the model.Mean squared error (AvgMSE) is employed to observe the average prediction error from the validation subset.


where yj and yj' are the j-th actual and predicted values, respectively; n represents the total number of validation samples; and k represents the total number of training simulations through k-fold crossvalidation. The

System Applications Description of Data Set
The historical data set for experiment was obtained from previous literature [22].The data set has a total of 38 records of concrete mix proportion and was used to examine the behavior of concrete mixture with strength at seven days.Every data set has five input variables and one output variable, including cement, fine aggregate (FA), coarse aggregate (CA), water to cement ratio (W/C), 7-day strength test result (fc7), and the 28-day strength test result (fc28), respectively.The characteristics of the variables of each data set can be seen in Table 1.

Machine Learning Models for Comparison
In order to compare the SOS-LSSVR model, this research applies three widely used machine learning models as follows:

Standard SVR Model
This SVR model used in this study is a traditional SVR model named ɛ-SVR [23].One can simply determine the parameters" value according to some experimental rules, and the standard SVR model does the learning process by using all the training set.The standard SVR model performs the learning process by employing all the training sets.The values of parameters can be determined by some experimental rules.As suggested by Chang and Lin [23], the following are the values of parameters used in this study: C = 1 and  = 0.2,

Standard LSSVR Model
LSSVR are recognized as least squares versions of the standard SVR model.Rather than solving convex quadratic programming problem for classical SVR, LSSVR uses a set of linear equation to come to a solution.As suggested by Suykens and Vandewalle [8], the following are the values of parameters used in this study:  = 1 and  = 1,

Levenberg-Marquardt Backpropagation Neural Network (LM-BPNN) Model
BPNN is introduced as a variant of neural network (NN).It is characterized by the fact that computation of the gradient of error function proceed backwards through the network.To minimize the error, the procedure is repeated during learning, and the weights are adjusted by the back propagation of error.Levenberg-Marquardt optimization is followed to update bias and weight values in LM-BPNN.The following are the values of parameters used in this study: minimum performance gradient = 1E-07, initial  = 0.001,  decrease factor = 0.1,  increase factor = 0.1, maximum  = 1E+10.

Model Evaluation Methods
Four different performance metrics are employed for evaluating the predictive methods, namely coefficient correlation (R), root mean square error (RMSE), mean absolute percentage error (MAPE), and mean average error (MAE).They are applied on the predicted output results of the test data set.The following equations express the performance metrics: where yi and yi' are the i-th forecasted and actual values respectively, and n is the total number of prediction samples.

Prediction Results and Comparison
The training process is carried out for every machine learning model in this study.A test set is applied to each trained model as it is necessary to evaluate the training of all developed models.In Figure 3, the training and test results of the SOS-LSSVR prediction model are shown when the final set of hyper-parameters is used.Table 2 shows the full prediction results for machine learning models.The study applied four model performance metrics on the prediction results of every machine learning model.The testing results were compared with those reported in the relevant literature to additionally validate the modeling performance of the proposed system.The deviation between actual and predicted output values between Rafi and Nasir"s work [22] and the proposed system are shown in

Conclusion
In this study, a new metaheuristic-based machine learning system called SOS-LSSVR was proposed to accurately estimate the compressive strength considering the concrete properties and the early agetest result.To investigate the performance of the proposed SOS-LSSVR system, three machine learning techniques were employed for benchmarking purpose.The laboratory test of 38 samples obtained from the past literature served as the experimental data set.The data set was divided into training and test sets and was used for building and validating the prediction model.Subsequently, the training set was divided into training subset and validation subset generated by 5-fold cross-validation to avoid the overfitting problem.This study used four performance metrics (R, RMSE, MAE, and MAE) to further compare the proposed SOS-LSSVR for the overall performance outcome of the applied predictive techniques.According to the results, the proposed SOS-LSSVR has the highest accuracy for all mentioned performance metrics.
This study presents a significant contribution to address the modelling of concrete behavior considering the mixture properties and early-age test result.The SOS-SVR helps the concrete designers and the users in decision-making processes on the basis of early strength test results as it predicts the concrete behavior accurately.As demonstrated by the analytical results, SOS-LSSVR is the most reliable model for predicting accurately behavior of concrete mix proportion at an early-age of strength.
Partitioning the Training Data Set For machine learning technique, establishing a good prediction model is critical and requires a proper training and test process.To start off with, a data set is utilized to build a prediction model in the training phase.The trained model is further used to validate a new and unseen data set.Nevertheless, if the entire data set is used for training, there is a possibility of an "overfitting" phenomenon to actually occur.In this circumstance, the trained model fits the data set very well, but, it performs poorly for a new and unseen data set.To prevent the overfitting problem, dividing the training set into two subsets has become a common practice nowadays.The large portion of the training set is categorized as the "training subset" and the small portion of the training set is labelled as the "validation subset".The larger subset is used for training the model while the smaller subset is employed to validate the model built.

Figure 2 .Figure 3 .
Figure 2. The Process of Model Selection and Training Process of SOS-LSSVR

Figure 4 .
The comparison revealed that out of total 38 samples, SOS-LSSVR achieved a lower prediction deviation in 26 samples.Furthermore, the average prediction deviation of SOS-LSSVR is much lower (1.03MPa) in comparison to those of Rafi and Nasir (1.30 MPa).

Figure 4 .
Figure 4. Deviation between Actual and Predicted Output Values from Rafi and Nasir [22] and SOS-LSSVR

Table 1 .
Statistical Description of Concrete Mix Proportion The best model outcome is indicated by the highest R value and the lowest MAE, MAPE, and RMSE values.
The linear association strength between actual and predicted output values is demonstrated by the R. The difference between actual and predicted output values is quantified by the RMSE.The average magnitude of errors between actual and predicted output values is calculated by the MAE, while the direction of errors is disregarded.The absolute errors are determined by the MAPE.Nevertheless, unlike in the case of the MAE, the size and unit of actual and predicted output values do not affect the MAPE.The error performance of every machine learning model on the test set is shown in Table3.

Table 3 .
Comparative Prediction Results on Test Set.