Contamination degree prediction of insulator surface based on exploratory factor analysis ‐ least square support vector machine combined model

This study presents a combined model based on the exploratory factor analysis (EFA) and the least square support vector machine (LSSVM) to predict the contamination degree of insulator surface. Firstly, EFA method is utilised to reduce numerous influence factor variables of the insulator contamination into a few factor variables, which could decrease the complexity of the model. Then, regarding the above factor variables as new input variables, LSSVM model is established to predict the insulator contamination degree. In order to obtain the optimal predictive value, the non ‐ dominated sorting genetic algorithm II is applied on the optimization of LSSVM model parameters. The proposed EFA ‐ LSSVM combined model is compared with the models of LSSVM, back propagation neural network, and multiple linear regression on the model performance. Results indicate that the EFA ‐ LSSVM combined model in this study effectively overcomes the shortcomings of the other three models mentioned above in computational time, prediction accuracy and generalization ability. Finally, the feasibility of the proposed model in predicting contamination degree of insulator surface is verified by adopting the radar map of


| INTRODUCTION
Insulators are important components of electric power transmission system, and their contamination external insulation characteristics have a significant impact on the security and reliability of power systems [1][2][3][4]. The pollutant particulates are deposited on the insulators in the long-term operation, forming a pollution layer on the surface. In the drizzle or fog weather conditions, the pollution layer will absorb water moisture and get damp. Meanwhile, soluble salt, alkali and other components contained in the pollution layer become moist and electrolytes dissolve in the water, forming a conductive layer on the insulator surface [5,6]. It degrades the insulation performance of insulators and increases the probability of pollution flashover, which threatens the safe and stable operation of power transmission lines [7,8]. The equivalent salt deposit density (ESDD) and the non-soluble deposit density (NSDD) are widely recommended to express the contamination degree of insulators, and also provide the basis for the classification of contamination level in power system, adjustment of leakage distance of the insulator string and guideline of insulator cleaning [9]. However, the ESDD and NSDD measurements are time-consuming and quite complicated. At present, it is difficult to obtain the real-time data of insulator contamination degree. Therefore, if the contamination degree of insulators is predicted accurately, measures such as adjusting the creepage distance of insulator string can be taken to reduce or even prevent the occurrence of pollution flashover effectively, which is of great significance for guaranteeing the safety and stability of power grids.
There has been appreciable attention on the study of the insulator contamination degree prediction in recent literatures [10][11][12][13][14][15][16]. Quantities recommended to express contamination levels are ESDD, the surface conductance, the leakage current, the air pollution measurements, and NSSD [10]. Leakage current is considered to be the most important parameter for the insulator contamination degree characterization due to its providing most effective and comprehensive information about the state of polluted insulators [17][18][19][20][21]. However, due to the condition limitations, many transmission lines are not equipped with leakage current monitoring devices. The ESDD and NSDD are still significant and popular parameters to characterise the insulator contamination degree. Presently, contamination degree prediction methods can be divided into physical model methods, traditional statistical methods and machine learning methods. The prediction method based on the physical model is to analyse the adhesion process of particles on the insulator surface, and put forward the physical model of pollution particles collision adhesion to predict the ESDD and NSDD [11]. However, it is difficult for the physical model to explain the process completely due to the complexity of the insulator contamination process, leading to large prediction error and poor generalization ability. The contamination degree prediction method based on the traditional statistics is to utilise the historical data to predict ESDD and NSDD by multiple linear regression (MLR) [12]. This method needs a lot of data as support, and it has some limitations due to the poor generalization ability when dealing with small samples. The machine learning algorithms are widely used in the prediction of insulator contamination degree, which usually includes support vector machine (SVM), artificial neural network and least square SVM (LSSVM) [13][14][15][16]. However, they also possess some defects such as weak ability to interpret high dimensional mapping of kernel function and long training process. The above researches show that the single prediction model has been unable to achieve satisfactory prediction results due to its own limitations. Whereas the combined prediction model can integrate the advantages of a single model and thus enhance the prediction effect through the complementary advantages among models, effectively improving the prediction accuracy and computational speed. Therefore, a combined model based on exploratory factor analysis (EFA) and LSSVM is established for predicting the insulator contamination degree in this study.
The innovative features of the study are described in the following two aspects: i. The significant innovation is the setup of the EFA-LSSVM combined prediction model. Numerous original input variables are reduced into four factor variables by EFA method, simplifying the model while retaining the most of the original information. Then the LSSVM prediction model in which the above factor variables are regarded as new input variables is established, and the non-dominated sorting genetic algorithm II (NSGA-II) is applied on the optimization of model parameters to obtain the optimal prediction results, effectively improving the prediction accuracy and greatly shortening the computational time ii. In addition, the radar map is adopted to visually analyse the eight evaluation indexes of model performance. The performance differences of different models can be distinguished more intuitively and effectively according to the shapes of radar maps. This provides a new approach for the model performance evaluation in respect to the contamination degree prediction of insulators

| MAIN INFLUENCE FACTORS OF INSULATOR CONTAMINATION
The pollution of insulator surface mainly comes from the deposition of pollution particles in the air. The pollution accumulation on insulator surface is influenced by many factors, including the concentration of pollutants in the air, humidity, wind speed, precipitation, rainy days and external insulation parameters of insulators [22][23][24][25][26][27][28][29][30][31]. The research [26] on insulator natural contamination performance shows that air quality index (AQI) has a significant correlation with the ESDD of insulator surface. The pollution accumulation rate and ESDD on insulator surface increases with the increase of AQI. Another research in reference [27] gives the conclusion that the pollution particles in the air are more easily deposited on the insulator surface through the wet deposition, which makes the insulator contamination accumulation intensify obviously. The effect of wind speed on the insulator surface contamination is comparatively complex. According to the research in reference [28], the insulator pollution accumulation has a positive relationship with wind speed at range of 2.2-4.5 m/s, and while the insulator pollution accumulation decreases with the increase of wind speed in 4.5-6 m/s. When the wind speed is more than 6 m/s, the pollution degree on insulator surface tends to be stable. The influence of precipitation on insulator contamination is more significant. References [29][30][31] present a research on the effects of precipitation on contamination process of insulator strings, which concludes that: a small amount of precipitation can effectively clean the soluble pollution on insulator surface, while for the non-soluble pollution on insulator surface, it needs strong precipitation to be able to effectively wash. The research result shows that there is negative correlation between pollution accumulation and precipitation. Therefore, for the selected insulator, the influence factors of insulator contamination can be divided into two categories: air pollution factors and meteorological environment factors. Air pollution factors mainly include concentration of pollutants such as AQI, PM2.5, PM10, NO 2 and SO 2 , while meteorological environmental factors mainly include temperature, humidity, precipitation, rainy days, wind speed and other meteorological parameters.

| EFA theory and algorithm
The core of EFA is to integrate many research variables into a few easily interpretable factor variables based on the SUN ET AL.
-265 premise of minimizing information loss, reducing the complexity of the model [32]. Through flexible factor rotation, each factor has clear professional significance and practical meaning.
It is assumed that there are p observable variables in n samples, and X ¼ ðx 1 ; x 2 ; …; x p Þ T is the observable vector after standardization. The original variables are represented by the linear combination of m ðm < pÞ factor variables, which is the dimension reduction process of EFA. The equations of the mathematical model are shown as follows: 8 > > < > > : Matrix form of the model is given by Equation (2): where F is the factor variable of the original variable X, ε is the special factor of X, A is the factor loading matrix, a ij (i ¼ 1,2, …,p; j ¼ 1,2,…,m) is the factor loading. Generally, the 0.5 principle is adopted to screen the original variable, namely that when the absolute value of the factor loading is greater than 0.5, the factor variable is considered to dominate the original variable [32]. The specific modelling steps of EFA are as follows: i. Judge whether the original variables are suitable for EFA. Generally, Kaiser-Meyer-Olkin (KMO) test and Bartlett spherical test are used for the judgement. When the value of KMO measurement of sampling adequacy is above 0.5 and the significance level of Bartlett spherical test is less than 0.05, the research variables are suitable for EFA [33,34]. The original data should be standardized to eliminate the influence caused by the difference of dimension and order of magnitude between variables before the judgement. The standardization formula is shown as follows: where x i is the variable after standardization, μ i is the average value of the original variable x i , and σ i is the standard deviation of original data. ii. Construct the factor variable. Firstly, it is necessary to select the proper number of factors, and usually the eigenvalue method is used to determine the number of factors. Eigenvalue method is to calculate the eigenvalues of correlation coefficient matrix, and select the number of eigenvalues greater than 1 as the number of factors. Then the next step is to solve the factor loading matrix. In this study, the principal factor method is employed to solve the problem. The specific formulas are characterised as follows: 8 > > > > > > < > > > > > > : where m is the number of factor variables, R is the correlation coefficient matrix, A is the factor loading matrix, D is the diagonal matrix, the diagonal element of D is the variance σ i 2 of the special factor ε, R* is the adjusted correlation coefficient matrix,Â is the solution of the factor loading matrix A by using principal factor method, λ 1 * ; λ 2 * ; …; λ m * ðλ 1 * ≥λ 2 * ≥… ≥ λ m * ≥0Þ is the first m eigenvalues of the matrix R*, u 1 * ; u 2 * ; …; u m * is the corresponding orthogonal unit eigenvector. iii. Rotate the factor to better explain the factor variable. The factor rotation is divided into orthogonal rotation and oblique rotation. The former sets the factor variables irrelevant, while the latter allows the factor variables to be related. In this study, the varimax rotation method is used to rotate the factor variables to ensure the irrelevancy between them iv. Calculate factor scores. In this study, the least square regression method is adopted to estimate the coefficient of factors and calculate the factor scores of each factor variable on each sample. The factor scores of each factor variable can be taken as the data of input variables in the subsequent prediction model. EFA model simplifies the problem by reducing the dimensions of the original variables. In subsequent analysis, factor variables can be used to replace the original variables for regression prediction and other modelling studies 3.2 | Least square support vector machine

| LSSVM algorithm
SVM is a kind of powerful method in machine learning and data mining built on Vapnik-Chervonenkis dimension and structural risk minimization principle of Statistical Learning Theory. It can map the sample data into a high-dimensional feature space via a non-linear mapping determined by a kernel function to construct the optimal hyperplane [35]. The specific algorithms of LSSVM applied to the regression prediction problems are described as follows.
The non-linear mapping function is selected to map the original sample space to the high-dimensional feature space for the given training sample set {(x 1 , y 1 ), (x 2 , y 2 ),…, (x n , y n )}, where x i is the m dimensional input vector, x i ∈R m , and y i is the output, y i ∈R. The linear regression function constructed by the LSSVM model is shown as follows in the high-dimensional feature space: where Φ(x) is the mapping function from the original lowdimensional space to the high-dimensional space, ω is the weight vector of hyperplane, ω∈R m , and b is the offset.
According to the structural risk minimization principle, the objective function of LSSVM model after optimization is given by Equation (6): where γ is the penalty factor used to balance the model complexity and training error, and ξ i is the error vector between the actual value of the output variable and the predictive value. Lagrange multiplier α i (i ¼ 1,2,…,n) is introduced to solve the above objective function, as shown in Equation (7): The partial derivatives of ω, b, ξ, and α are obtained respectively, which are made equal to 0. And the equations are shown as Equation (8): By eliminating ω and ξ i in Equation (10), the matrix equation can be obtained as follows: where According to the functional theory, the function k( is the kernel function satisfying Mercer theorem [36]. The selection of kernel function will directly affect the model performance. To improve generalization ability and prediction accuracy, the Gaussian radial basis function is chosen as the kernel function of LSSVM model in the study. The expression is elaborated as follows: where ||x i -x j || 2 is the square of Euclidean distance between vectors x i and x j , and σ is the kernel parameter reflecting the distribution characteristics of sample data. The Gaussian radial basis function is applied to the LSSVM model to obtain the regression function, as shown in Equation (11):

| Optimization of LSSVM model parameters
The penalty factor γ and Gaussian kernel parameter σ need to be determined in advance to build LSSVM prediction model. The penalty factors and kernel parameters of LSSVM model will affect the prediction accuracy and generalization ability. Therefore, the NSGA-II is introduced to automatically search the optimal parameters for LSSVM model in this study. The main steps of the algorithm are shown as follows: i. Standardize the data of training sample set according to the formula of Equation (3). Set the population size and the maximum number of iterations, and use the real number coding method for chromosomes ii. Set the number of the objective functions, the number of decision variables and the threshold values. Generate the initial population randomly, and calculate the objective function value of each individual in the current population. Fast non-dominated sorting, selection, crossover and mutation operations are carried out. The detailed process is depicted in reference [37] iii. When iteration times reach the preset maximum number of iterations, stop the iteration. And the optimal model parameters can be obtained

| Algorithm flow of EFA-LSSVM combined model
In this study, a combined prediction model based on EFA and LSSVM improved by NSGA-II is proposed. The detailed steps of the proposed model are as follows.
i. By means of EFA method, the original variables of p dimension are reduced to obtain m factor variables and the factor scores of each factor variable on n samples SUN ET AL. -267 ii. Taking factor variables as input variables, a new sample setfx i ; y i g n i¼1 , x i ∈R m , y i ∈R, is divided into training set and test set to build the LSSVM regression model iii. The Gaussian radial basis function is selected as the kernel function of LSSVM model. The penalty factor γ and Gaussian kernel parameter σ are solved by NSGA-II in training set data, and the optimal parameters are selected to optimise the LSSVM model iv. The test set data are input into the LSSVM model optimised by NSGA-II to obtain the prediction results The flowchart of EFA-LSSVM combined model algorithm is shown in Figure 1.

| PERFORMANCE EVALUATION METHOD
In this study, eight evaluation indexes are selected to evaluate the model performance. The evaluation indexes include mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), mean squared percentage error (MSPE), Theil inequality coefficient (TIC), coefficient of determination (R 2 ), modelling time (T M ) and prediction time (T P ). The detailed expressions are revealed in Table 1, where y i is the actual value, ŷ i is the predictive value, y is the mean value of test set data, k is the test set number.

| Acquisition of test data
The sample data come from the natural contamination test of insulators. The test sites are located in the Shandong section �660 kV Yin-dong direct-current transmission line, which are the #2125 transmission tower located in Jiyang District of Jinan, the #2409 transmission tower located in Linzi District of Zibo, and the #2583 transmission tower located in Hanting District of Weifang, respectively. The porcelain insulator XHP-210 is selected as the test object for natural contamination test. In strict accordance with the method of the enterprise standard Q/ GDW 1152.2-2014 [38], the insulator strings (five pieces per string) are suspended vertically under the negative line and the cross arm in the �660 kV transmission tower respectively. The insulators suspended vertically under the negative line are referred as charged insulators. The insulators suspended vertically under the cross arm are referred as uncharged insulators. The specific suspension mode of test insulators is displayed in Figure 2. The structure and parameters of test insulators are shown in Table 2, in which D is the shed diameter, H is the configuration height, and L is the leakage distance.
All the test insulators are thoroughly washed with deionized water before the test. The hanging time of test insulators is from    The insulators shall be cleaned so as to continue the pollution accumulation after each measurement. The complete pollution sampling information of natural contamination test is listed in Tables S1 and S2 of the Supplementary material.

| Variable selection
According to the analysis of the influence factors of insulator contamination,  (1) ,⋅⋅⋅,X (i) ,⋅⋅⋅,X (6) }, where X (i) is the data of the i-th month during contamination period. Figure 4 presents the detailed data of original input variables corresponding to each sample. As shown in Figure 4, the value range of AQI is 70. 26 Table 4, which demonstrate that the original variables are suitable for building EFA model. The eigenvalue method is used to select factor variables. The eigenvalues of correlation coefficient matrix are given in Table S3 of the Supplementary material. There are four eigenvalues greater than 1, and thus four factor variables are selected to represent the original input variables. The factor loading matrix is calculated by the principal factor method. The varimax rotation method is employed to rotate the factor variables for obtaining the rotated factor loading matrix. And then the 0.5 principle is adopted to screen the original variable. Table 5 presents the actual meaning of each factor variable and the corresponding variance contribute rate. As shown in Table 5, the original variables dominated by the factor variable F1 are I AQI , W PM2.5 , W PM10 , W NO2 and W SO2 ; the original variables dominated by the factor variable F2 are T, P and D R ; the original variable dominated by the factor variable F3 is H; the original variable dominated by the factor variable F4 is D W . Combined with the meanings of the original variables, the factor variable F1 reflects the air pollution state and can be directly regarded as the air pollution factor; the factor variable F2 reflects temperature and rainfall indexes and can be regarded as temperature and rainfall factor; the factor variable F3 reflects the humidity index and can be regarded as humidity factor; the factor variable F4 reflects the wind index and can be regarded as wind factor. The cumulative variance contribute rate reaches 94.41% in four factor variables, indicating that these factor variables can well reflect most of the information contained in the original input variables. Finally, the least square regression  The factor variables constructed in this study can reflect most of the information of the original variables, and has clear practical meaning, indicating that EFA has excellent effect. The factor scores of each factor variable can be applied in the subsequent model.

| Prediction modeling based on LSSVM
After the dimension reduction of original input variables using EFA method, factor variables are taken as new input variables, and factor scores of each factor variable are taken as the data of new input variables. The ESDD and NSDD of insulators are used as output variables to construct new sample data set. The specific meaning of sample variables is elaborated in Table 6.
The sample set is divided into training set and test set according to the proportion of 3:1. The first 23 groups of data are regarded as training sample set, and the last seven groups of data are regarded as test sample set. And then the LSSVM prediction model is established. Taking RMSE and R 2 as objective functions, the LSSVM model parameters γ and σ are optimised by using NSGA-II in training set data. Set the population size to be 100 and set the maximum number of iterations to be 1000. Crossover probability and mutation probability are 0.9 and 0.01, respectively. The value range of γ is [1,1000], and the value range of σ is [1,50]. The optimal parameters of LSSVM model are calculated as shown in Table  7. Four groups of insulator contamination degree prediction models are established respectively based on the optimal parameters, and the corresponding modelling time is recorded.

| Results analysis and model evaluation
The prediction model established in Section 5.4 is applied to predict the contamination degree of insulators, and the corresponding prediction time is recorded. As comparison, three classical models based on LSSVM, back propagation neural network (BPNN) and MLR methods are built to predict the contamination degree of insulators respectively. The genetic algorithm is employed to optimise the parameters of above models and the number of iterations is 1000. The modelling time and prediction time of these three models are also recorded respectively. In order to demonstrate the prediction results of the four models more intuitively, the comparisons of prediction results of contamination degree among four models are illustrated in Figure 5. As is presented, the prediction effect of the EFA-LSSVM model is significantly better than that of the other three models. The EFA-LSSVM model has the highest prediction stability and accuracy with the minimum relative error of only 0.311% and the maximum relative error of 5.701%. The performance of the LSSVM model is slightly worse, and all relative errors are within 11%. The MLR model has the worst prediction accuracy and largest error fluctuation, and the maximum relative error is above 20%. According to the model performance evaluation method proposed in Section 4, the evaluation indexes of four prediction models are calculated respectively, and the results are shown in Table 8, which lists only the evaluation indexes of the charged insulator ESDD prediction models due to limited space. The indexes of MAE, RMSE, MAPE, MSPE, TIC, T M and T P of the EFA-LSSVM model are lower than those of the other three models, and the index R 2 of the EFA-LSSVM model is closer to 1. From Table 8 -271 other three models is 64.36%, 63.27% and 83.6% as calculated by the following formula.
where EI LSSVM , EI BPNN  From the perspective of computational time, the EFA-LSSVM model reduces the dimension of the original variables, which effectively decreases the complexity and run time of the model. By using NSGA-II to optimise the model parameters, the modelling time and prediction time of the EFA-LSSVM model are greatly shortened compared with the other three models, which is of great engineering significance. The results indicate that the EFA-LSSVM method has higher prediction accuracy, stronger generalization ability and better prediction effect compared with the LSSVM, BPNN and MLR methods. Therefore, the proposed method achieves superior performance than the existing methods.
In order to compare the performance differences of the four models more intuitively, the radar map is adopted to visually analyse the above eight evaluation indexes. The ranges of the different coordinates in radar maps should be restricted and unified due to the large order of magnitude difference between the evaluation indexes. The ranges of coordinates in radar maps are set as follows: where V c is the value of the coordinate, V min and V max are the minimum value and maximum value of the coordinate-corresponding evaluation index respectively. The ranges of eight evaluation indexes in the radar maps are stated in Table 9.
In order to better present the advantages and disadvantages of the four models on each evaluation index, the maximum coordinate values of MAE, RMSE, MAPE, MSPE, TIC, T M and T P are positioned at the centre point of the radar map, and the coordinate axis scale of these evaluation indexes is getting smaller gradually from the centre to the outside. On the contrary, the minimum coordinate value of the evaluation index R 2 is positioned at the centre point and the corresponding coordinate axis scale is getting larger gradually from the centre to the outside. It can be demonstrated obviously from Figure 6 that the EFA-LSSVM model has more robust and better prediction performance compared with the other three models. Therefore, the effect of using EFA-LSSVM for the insulator contamination degree prediction is superior to that of the other three classical models. Compared with the other three models, the EFA-LSSVM model proposed in this study firstly utilises the EFA method to reduce the dimension of the original variables, which decreases the complexity of the model while retaining most of the original information. Then, the LSSVM prediction model is established, and NSGA-II is adopted to optimise the model parameters, which greatly improves the modelling ability and prediction accuracy of the model.

| Practical engineering application
According to the enterprise standard Q/GDW 1152.2-2014 [38] and the research in reference [40], the ESDD and NSDD design  -273 values of different polluted areas in the Shandong section �660 kV Yin-dong transmission line are illustrated in Table 10. When the ESDD and NSDD of insulators exceed the design values, the creepage distance of insulator string should be adjusted according to the actual contamination degree. However at present the external insulation configuration of �660 kV Yin-dong transmission line should meet the need of long-term non-cleaning [40]. And it is not necessary to clean the insulators regularly as before. Here the proposed EFA-LSSVM model can be adopted to accurately predict insulator contamination degree through the above research in  Tables 11  and 12. Results indicate that the ESDD and NSDD are predicted accurately with less than 3.21% relative error by using the proposed model. Because the tower is in medium polluted area, the ESDD and NSDD both exceed the design values according to the prediction results. Therefore, we suggest increasing the creepage distance of insulator string appropriately.
In summary, we can accurately predict the ESDD and NSDD of insulators during the operation by using the proposed EFA-LSSVM model. When the predictive values of ESDD and NSDD exceed the upper limits of design values, we can supply the pre-warning information and suggest the electric power company to timely increase the creepage distance, such as increasing the number of insulator string or replacing insulators with large creepage distance.

| Limitations and future work
Although we have carried out the research on the ESDD and NSDD prediction of insulators by using the proposed EFA-LSSVM model and achieved great results, there still exists some limitations in this study due to not taking the leakage current parameters into consideration. As a dynamic parameter, leakage current is the comprehensive reflection of the operating voltage, climate condition and contamination degree and can be used for on-line detection [17][18][19][20][21]. Compared with ESDD and NSDD, the leakage current provides more effective and comprehensive information about the state of polluted insulators [41]. And it is valuable and practical to predict the contamination degree of insulator surface based on characteristics of leakage current. Therefore, it is necessary to incorporate the leakage current parameters in the insulator contamination degree prediction model in future work. According to the research in references [42][43][44], the three characteristics of the leakage current, which are the mean value, maximum value and standard deviation, can reveal jointly the actual contamination level of the insulator. The three characteristics are expressed as follows: where N denotes the total number of sampling points, I(i) denotes the value of leakage current during sampling period, I e denotes the average value of leakage current, I max denotes the maximum value of leakage current, σ I denotes the standard deviation of leakage current. Figure 7 illustrates the schematic diagram of improved prediction model of insulator contamination degree. In this study, the air pollution parameters and meteorological environment parameters are considered as input variables, and these parameters are marked with red box. The leakage current parameters of I e , I max and σ I will be added to the model input variables in future research and the ideas are as follows. Firstly, we must select proper data because there is no leakage current flowing through the insulator surface under dry conditions. Secondly, we need to judge whether the original input variables including leakage current parameters are suitable for EFA by KMO and Bartlett spherical tests and the corresponding data are required to be standardized. Numerous original input variables are reduced into a few factor variables by EFA method. Then the factor variables are taken as new input variables and output variables are ESDD and NSDD. These data are input into the LSSVM model. Finally, the model parameters are optimised by the NSGA-II algorithm and prediction results are obtained. The computational time of the EFA-LSSVM model may increase due to the increase of original input variables. Meanwhile, the contamination degree should be predicted more accurately because of considering leakage current parameters.  -275

| CONCLUSION
The pollution flashover accidents of insulators have been a long-standing problem for the safe and stable operation of power systems. Accurate prediction of contamination degree on insulator surface is significant for taking the appropriate measures to prevent the occurrence of flashover in advance. This research has resulted in a combined model based on EFA and LSSVM to predict the insulator contamination degree. The combined model proposed in this study is thoroughly tested with several groups of data set of insulator contamination degree, and the radar map is adopted to evaluate the model performance. The results validate the efficacy and accuracy of the proposed combined model. The important conclusions of this study are drawn as follows: i. Several air pollution factor variables and meteorological environment factor variables are selected as the original input variables. EFA method is introduced to reduce the dimension of these input variables, and four unrelated factor variables are obtained, which can greatly simplify the model while retaining the most of the original information ii. The factor variables are taken as new input variables to establish insulator contamination degree prediction models based on LSSVM. The model parameters are optimised by the NSGA-II algorithm, which enhances the generalization ability of the model, further reduces the prediction error, and effectively improves the prediction accuracy and computational speed iii. Eight evaluation indexes are employed to evaluate the model performance. Compared with the models of LSSVM, BPNN and MLR, the EFA-LSSVM combined model established in this study greatly shortens the computational time and has higher prediction stability and accuracy. Meanwhile, the radar map is selected to visually analyse the evaluation indexes, which can more intuitively and effectively distinguish the performance of different models according to the shapes of radar maps iv. The ESDD and NSDD of insulators during the operation can be predicted accurately by using the proposed EFA-LSSVM model, which will provide the proper guideline for the selection of creepage distance of insulator string in practical engineering application Currently, this study emphasises to build a new and effective combined model for the ESDD and NSDD prediction of insulators, and then the model performance is evaluated by using the radar map. In fact, the ESDD, NSDD and leakage current are all considered as the significant parameters for contamination degree characterization of the insulator. The leakage current provides effective and comprehensive information about the state of polluted insulators. Therefore, our future research will focus on the contamination degree prediction of insulator surface based on characteristics of leakage current. TA B L E 12 NSDD prediction results of insulators of certain tower in �660 kV transmission line by using EFA-LSSVM model

F I G U R E 7
Schematic diagram of improved prediction model of insulator contamination degree. EFA-LSSVM, exploratory factor analysisleast square support vector machine; ESDD, equivalent salt deposit density; NSDD, non-soluble deposit density