^{1}

^{*}

^{1}

Accurate and timely monthly rainfall forecasting is a major challenge for the scientific community in hydrological research such as river management project and design of flood warning systems. Support Vector Regression (SVR) is a very useful precipitation prediction model. In this paper, a novel parallel co-evolution algorithm is presented to determine the appropriate parameters of the SVR in rainfall prediction based on parallel co-evolution by hybrid Genetic Algorithm and Particle Swarm Optimization algorithm, namely SVRGAPSO, for monthly rainfall prediction. The framework of the parallel co-evolutionary algorithm is to iterate two GA and PSO populations simultaneously, which is a mechanism for information exchange between GA and PSO populations to overcome premature local optimum. Our methodology adopts a hybrid PSO and GA for the optimal parameters of SVR by parallel co-evolving. The proposed technique is applied over rainfall forecasting to test its generalization capability as well as to make comparative evaluations with the several competing techniques, such as the other alternative methods, namely SVRPSO (SVR with PSO), SVRGA (SVR with GA), and SVR model. The empirical results indicate that the SVRGAPSO results have a superior generalization capability with the lowest prediction error values in rainfall forecasting. The SVRGAPSO can significantly improve the rainfall forecasting accuracy. Therefore, the SVRGAPSO model is a promising alternative for rainfall forecasting.

Monthly rainfall time series exhibit non-stationary characteristic, which can be described as time series whose statist distributions change over time. The structural changes of monthly rainfall may be caused by the various processes of atmospheric physical change, such as atmospheric physics, temperature physics, pressure field and sea temperature field, etc. So accurate and timely monthly rainfall forecasting is one of the most difficult processes of the hydrology cycle for both water quantity and quality management [

Although SVRs have been recently proposed as a new technique for machine learning problems, the literature about SVRs is vast and growing. When using SVR in regression estimation, many important questions research remain, such as, how to choose the optimal parameters of SVR. Optimal parameters of the kernel function can lead to the accuracy of the SVR regression estimation. Inappropriate parameters in SVR lead to over-fitting or under-fitting in the SVR regression estimate for application of actual precipitation prediction. Support vector machine hyper-parameters are obtained through trial-and-error by the operators, which leads to the effects of SVR applications strongly depends upon the operator’s experience [

Recently, several studies have proposed the parameter optimization of Gaussian kernel function by evolutionary optimization, such as Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) [

The present study proposed a novel parallel co-evolution algorithm of GA combined with PSO to optimize the SVR parameters, namely SVRGAPSO based on the mechanism of information interaction between GA and PSO when they are iterating over two populations. Our approach determines the optimal kernel parameter values for the SVR model in monthly rainfall forecasting. The rainfall data of Nannig, Guangxi, China, is predicted as a case study for our proposed method. An actual case of forecasting rainfall is illustrated to show the improvement in predictive accuracy and capability of generalization achieved by our proposed SVRGAPSO model. These are many monthly rainfall forecasting models of application different approaches; SVRGAPSO model achieves better generalization performance than other regression estimation approaches. The rest of this study is organized as follows. Section 2 describes the SVRGAPSO, ideas and procedures. For further illustration, different models are used to employ for rainfall forecasting analysis in Section 3, and conclusions are drawn in the final section.

The brief ideas of SVR for the case of regression are introduced. Suppose we are given training data ( x i , y i ) i = 1 N , where x i is the input vector; y i is the output value and N is the total number of data dimension [

f ( x ) = ω T ∅ ( x ) + b (1)

where x denotes the forecasting values; ∅ ( x ) denotes the high dimensional feature space, which is non-linearly mapped from the input space x; ω is the coefficients and b are adjustable. The coefficients ω and b can be estimated by minimizing the regularized risk function:

min ω , b , ξ * , ξ R ε b ( ω , ξ * , ξ ) = 1 2 ω T ω + C ∑ i = 1 N ( ξ * + ξ ) (2)

Subject to

{ y i − ω ⋅ ∅ ( x i ) − b ≤ ε + ξ i * − y i + ω ⋅ ∅ ( x i ) + b ≤ ε + ξ i ξ i * ≥ 0 ξ i ≥ 0 i = 1 , 2 , 3 , ⋯ , N (3)

Therefore, the objective of SVR is to include training patterns inside an ε -insensitive tube while keeping the norm ‖ ω ‖ as small as possible. The parameter ε is the difference between actual values and values calculated from the regression function. This difference can be viewed as a tube around the regression function. C denotes a cost function measuring empirical risk; it indicates a parameter determining the trade-off between the empirical risk and the model flatness. After the quadratic optimization problem with inequality constraints is solved, the SVR is given by:

f ( x , α i , α i * ) = ∑ i = 1 N ( α i − α i * ) K ( x , x i ) + b (4)

where α i and α i * are the Lagrangian multipliers associated with the constraints, K ( x , x i ) is called the kernel function. As the kernel function defines the feature space in which the decision function is constructed, exploring useful kernel function constitutes a significant topic in SVR application. The most used kernel functions are the Gaussian radial basis functions (RBF) with of the parament σ :

K ( x i , x j ) = exp ( ‖ x i − x j ‖ 2 2 σ 2 ) (5)

By using the kernel functions, SVR can efficiently and effectively construct many types of nonlinear functions to compute the dot product in feature space for regression estimation. Gaussian RBF kernel is not only easier to implement but also capable of non-linearly mapping the training data into an infinite dimensional space. Thus, it is suitable to deal with a nonlinear relationship. Therefore, the Gaussian RBF kernel function is specified in this study.

SVR based on radial basis kernel function has three parameters to be determined, where C is to trade-off between the model flatness and the degree of the training errors, ε is the width of the insensitive loss function, and σ is the bandwidth of the Gaussian kernel function. For example, if C is too large (infinity), then the objective is to minimize the empirical risk only. Parameter ε controls the width of the ε-insensitive zone, i.e., the number of support vectors (SVs) employed in the regression. Larger ε value implies fewer SVs employed; thus, the regression function is simpler [

Afore mentioned there is no structural method or any shortage opinions on efficient setting of SVR parameters. Recently, the authors applied a series of searching algorithms to test the potentiality and the suitability involved in the parameter’s selection of an SVR model. However, as mentioned above, the employed evolutionary algorithms almost lack knowledge memory functions, which makes it time-consuming and has a premature convergence to a local optimum in searching for the suitable parameters of an SVR model. Therefore, the GAPSO algorithm is used in the proposed SVR model to optimize the parameter selection.

Genetic algorithm is an adaptive optimization technique developed by Holland based on natural evolution and survival of the fittest, and works on a population of individuals [

PSO has been used to solve real time issues and aroused researchers’ interest due to its flexibility and efficiency, which is a stochastic, population-based optimization algorithm introduced by James Kennedy and Russell C. Eberhart [

Co-evolution concept is first proposed by Ehrlich and Raven who discuss the evolution between plants and herbivorous insects [

In this paper, real value chromosome of GAPSO directly handles the parameters themselves and much computation time is saved. The chromosome is comprised of two parts: SVR Parameter and Kernel Parameters. Real value chromosome { C i , ε i } represents the valued of the penalty parameter and insensitive loss function, respectively. Real value chromosome { σ i } represents the valued of kernel parameter. A fitness function assessing the performance for everyone must be designed before searching for the optimal values of the SVR parameters. The performance of the parameter set is measured by the mean absolute percentage error (MAPE) on the last subset. Averaging the MAPE over the N trials gives an estimate of the expected generalization error for training sets given by Equation (6)

F f i t n e s s ( x 1 , x 2 , ⋯ , x N ) = 1 [ 1 + 1 N ∑ i = 1 N | y i − y ^ i | y i ∗ 100 % ] (6)

where, x i is the training samples, N is the number of training data samples, y i is the actual value, and y ^ i the predicted value. The optimal parameter setting is critical to predicting the performance of SVR model. In this paper, Parallel Co-evolutionary algorithm based on GA combined with PSO is employed to simultaneously optimize SVR’s parameters and the kernel function’s parameter, namely SVRGAPSO.

Step1: Generate initial population. Two initial populations are randomly generated according to the target database. POP_{1} and POP_{2} use respectively the search strategy of PSO and GA to search for association rules. Two populations use the same coding rules, the fitness function, population size and the maximum evolution generation. This paper used real coding rules, in which the number of elements in an array of real numbers corresponds to transaction database field. The number of element values represents the attribute values of the field.

Step2: Initialize the two populations with GA and PSO parameters: number of iterations, crossover probability, mutation probability, particles velocity and particles position.

Step3: Input training data and calculate the fitness, which determine G_{best} and P_{best} by a simple comparison of their fitness values according to Equation (6). We compare fitness value of the global best individual Gpso in POP_{1} and best individual Gga in POP_{2}. Individuals with larger fitness values will replace the best individual of other populations, as a basis for the next generation of evolution. The adjustment strategies of crossover probability are shown in Equation (7):

P c ( x 1 , x 2 , ⋯ , x N ) = { P c max − P c min 1 + exp ( 2 ( f ′ − f ¯ / f max − f ¯ ) , f ′ ≥ f ¯ P c max , f ′ < f ¯ (7)

where { P c max = 0.9 } and { P c min = 0.3 } respectively denote the upper and lower limits of crossover probability { P c } . { f max } is the maximum fitness value of individuals in the current population, { f ¯ } is the average fitness value of the current population, and { f ′ } is the larger fitness value of two cross-individual. In this paper, the mutation probability is related to iterations number. The adjustment strategies of mutation probability are shown in Equation (8):

P m = { P m min + t T max , 0 ≤ t T max ≤ ( P m max − P m min ) P m max P m max − P m min − 1 ∗ t T max + P m max P m min − P m min + 1 , ( P m max − P m min ) ≤ t T max ≤ 1 (8)

where { P m max = 0.1 } and { P m min = 0.001 } are the upper and lower limits of { P m } , { T max } is the maximum number of iterations, { t } is the current number of iterations. In this paper, standard PSO algorithm is used to optimization operation for aims at continuous function to search operations. See literature [

Step4: We judge condition whether to meet the termination condition. If the number of iterations has reached the maximum number of iterations then the algorithm ends, switch to Step 5; or continue to the next step.

Step5: The speed and location of POP_{1} are updated in accordance with PSO and GA then produce next generation. Once the termination condition is met, it will output the best solution and obtain the optimal parameter setting for SVR model. Input test samples for the prediction effect of the SVR model.

The platform adopted to develop the SVRGAPSO approach is a PC with the following features: Intel Core i7-8550U, 1.80 GHz CPU, 32.0GB RAM, Windows 10 operating system and the MATLAB R2019a development environment. In this paper, GA and PSO parameters are set as follows: the iteration times are 100; the population is 40; crossover probability is 0.80; mutation probability is 0.05; the minimum inertia weight is 0.1; the maximum inertia weight is 0.9 and the learning rate is 2.0.

Real-time ground monthly rainfall data have been obtained from January 1952 to December 2017 form Guangxi Meteorological Bureau in Nanning of Guangxi, China. The data set contained 792 data points, whose training data set contained 480 (1952-1991) data for modeling, validation set is 240 (1992-2011) for validation model, and the remaining 72 (2012-2017) data are used to test the predictive effect of the.

It is very important to select of independent variables for rainfall forecasting model. In this paper, the most commonly variable selection method in meteorological operation to select predictive independent factors is introduced. First of all, the candidate forecasting factors are selected from the numerical forecast products based on 96 h forecast field, which includes: the 17 conventional meteorological elements and physical elements from the T213 numerical products of China Meteorological Administration, the data cover the latitude from 150N to 300N, and longitude from 1000E to 1200E, with 10 × 10 resolution, altogether there are 336 grid points. We can get 76 variables as the main forecasting factors.

This paper used the principal component analysis to obtain 12 variables as SVR’s input. The original meteorological data is used as real output.

This paper used the following evaluation metric to measure the performance of the proposed model: Root mean square error (RMSE), Mean absolute percentage error (MAPE), Coefficient of efficiency (CE), which can be found in many paper [

For building SVRPSO and SVRGA rainfall forecasting model, PSO is used to search for the optimal parameter values of SVR for rainfall forecasting by Chen K., et al. presented [

four different models in Nanning, Guangxi from January 1992 to December 2011.

From the graphs and tables, we can generally see that The SVRGAPSO algorithm enables the solution to jump out of local optima, and decreases the vibration near the end of locating a solution by information exchange between GA and PSO populations, and the forecasting results are very promising for monthly rainfall under study either where the measurement of forecasting performance is goodness of fit such as RMSE (refer to

Clearly, the RMSE is the only criterion to measure the accuracy of prediction. That is, accuracy in goodness-of-fit is only one of the most important criteria models and MAPE is the criterion to measure the relative performance of model for monthly rainfall forecasting. The training, validation and forecasting performance comparisons of various models for the rainfall via RMSE, MAPE, and CE are reported in

Model | RMSE | MAPE (%) | CE | |
---|---|---|---|---|

Train | SVR | 26.324 | 49.632 | 0.967 |

SVRGA | 22.587 | 40.217 | 0.989 | |

SVRPSO | 21.110 | 22.357 | 0.987 | |

SVRGAPSO | 3.755 | 5.632 | 0.998 | |

Validation | SVR | 28.677 | 51.496 | 0.963 |

SVRGA | 23.795 | 47.246 | 0.976 | |

SVRPSO | 21.580 | 23.362 | 0.980 | |

SVRGAPSO | 3.782 | 5.759 | 0.999 | |

Testing | SVR | 21.608 | 24.767 | 0.976 |

SVRGA | 26.335 | 22.482 | 0.965 | |

SVRPSO | 16.240 | 20.649 | 0.985 | |

SVRGAPSO | 12.499 | 8.552 | 0.991 |

the RMSE of SVRGAPSO is the smallest in all models. Focusing on the RMSE indicator in testing samples, our proposed SVR based on parallel co-evolutionary algorithm technique performs the best in all the cases, followed by SVRPSO technique and SVR technique; SVRGA is the worst from a general point of view.

Similarly, for the validation data, the RMSE of the SVR is 28.677, SVRGA’s RMSE is 23.795, SVRPSO’s RMSE is 21.580; while for the SVRGAPSO, RMSE reaches 3.782, SVRGAPSO’s RMSE is the smallest. In the testing sample results, we can see the same conclusion that the squared error sum of SVRGAPSO is also the smallest in all models. The main reason is that the GA is trapped in the local optimal solution and cannot find the optimal parameters of SVR.

Focusing on the MAPE indicator of the training case, validation and testing data, the SVRGAPSO model is also less than the SVR, SVRGA and PSO-SVR models, which the deviations of SVRGAPSO model between observed and forecasting value are the smallest. However, the low RMSE does not necessarily mean that there is a high hit rate of forecasting direction for monthly rainfall movement direction prediction. Thus, the CE comparison is necessary. CE indicators are more important than other RMSEs and MAPEs, because CE is an indicator of the trend of the model, and is mainly used to judge whether the trend of the forecast results is consistent with the actual precipitation trend. Similarly, the SVRGA model is the maximum in their three models in all stages. These results show the SVRGAPSO model have higher correlation relationship with observed rainfall values, it also implies that the SVRGAPSO model is capably to capture the average change tendency of monthly rainfall data. To summarize, the SVRGAPSO model is superior to the other three models presented here in terms of RMSE, MAPE and CE for rainfall prediction under the same input.

The main reason is that GA and PSO are easy to fall into local optimum and cannot evolve to optimal parameters. SVR are also prone to over-fitting through cross-validation, resulting in poor prediction results. In the iterative process of GAPSO using GA and PSO to exchange of information between the two populations, co-evolution algorithm not only is superior in the mining quality, but also has a significant advantage in the ability to jump out of local optimal solution also has the phenomenon of premature convergence. We get the global optimum with greater probability for SVR parameters.

From the experiments presented in this paper we can draw the following conclusions. The experimental results show that the GAPSOSVR monthly rainfall forecasting model is superior to the pure SVR model, the GASVR model as well as the PSOSVR models for the training, validation and test cases of monthly rainfall in terms of the measurement of RMSE, MAPE and CE, as can be seen from

The rainfall system is one of the most active dynamic weather systems. This paper presents a parallel co-evolution algorithm using GA and PSO to exchange each information between the two populations in the process of evolutionary iteration for the parameters of SVR in rainfall forecasting modelling. In terms of empirical results, we find that across different models for the test cases of monthly rainfall based on different evaluation criteria, our proposed SVRGAPSO forecasting technique performs the best. In all testing cases, RMSE of the proposed our modeling technique is the lowest and the CE is the highest, indicating that the SVRGAPSO forecasting technique can be used as a viable solution to monthly rainfall time series forecasting.

The authors would like to express their sincere thanks to the editor and anonymous reviewers’ comments and suggestions for the improvement of this paper. This work was supported in part by the Natural Science Foundation of China under Grant No.41575051, 41565005, and by the Science and Technology Foundation of Guangxi Province under Grant No. AD16450003, and No. 2018AB14003, and by the Guangxi Education Department under Grant 2019KY0863, 2017KY0896, KY2016YB554, and by the Key Disciplines for Operational Research and Cybernetics of the Education Department of Guangxi Province.

The authors declare no conflicts of interest regarding the publication of this paper.

Wu, J.S. and Xie, Y.S. (2019) Hybrid Support Vector Regression with Parallel Co-Evolution Algorithm Based on GA and PSO for Forecasting Monthly Rainfall. Journal of Software Engineering and Applications, 12, 524-539. https://doi.org/10.4236/jsea.2019.1212032