Identification of ultra-high-frequency PD signals in gas-insulated switchgear based on moment features considering electromagnetic mode

: The feature extraction and pattern recognition techniques are of great importance to assess the insulation condition of gas-insulated switchgear. In this work, the ultra-high-frequency partial discharge (PD) signals generated from four types of typical insulation defects are analysed using S-transform, and the greyscale image in time-frequency representation is divided into five regions according to the cutoff frequencies of TE m 1 modes. Then, the three low-order moments of every subregion are extracted and the feature selection is performed based on the J criterion. To confirm the effectiveness of selected moment features after considering the electromagnetic modes, the support vector machine, k -nearest neighbour and particle swarm- optimised extreme learning machine (ELM) are utilised to classify the type of PD, and they achieve the recognition accuracies of 92, 88.5 and 95%, respectively. In addition, the results show that the ELM offers good generalisation performance at the fastest learning and testing speeds, thus more suitable for a real-time PD detection.


Introduction
Gas-insulated switchgear (GIS) is a compact metal encapsulated switchgear consisting of high-voltage components such as circuitbreakers and disconnectors. It is widely used in power system due to the high reliability, low maintenance and compact size [1]. A dielectric breakdown in GIS could result in serious outages and thereby cause enormous economic losses [2]. Partial discharge (PD) activity often happens before the insulation failure [3]. Thus, the PD detection is of great significance for diagnosing the incipient faults of GIS [4].
Among the multifarious measurement methods, the ultra-highfrequency (UHF) method has attracted increasing attention because of its high sensitivity and strong anti-interference capability [5]. Different types of PDs give rise to harm with varying degrees. For example, damage caused by PD arising from the protrusion and void in the epoxy resin is more dangerous than that from the floating metal and bouncing particles [6]. Therefore, the PD source identification can provide a guideline for maintenance strategy.
Generally, the patterns to recognise PD in GIS are divided into two categories, i.e. phase-resolved PD (PRPD) [1] and timeresolved partial discharge (TRPD) [4,6]. Sometimes, it is difficult to obtain the phase information of PD in the field [7]. Besides, the PRPD mode requires massive memory space to record the data up to one thousand power frequency cycles [8]. In contrast, TRPD mode merely needs to acquire a single PD waveform. Since PD pulse typically belongs to the transient and non-stationary signal, separated time or frequency description cannot offer complete information. Comparatively, the time-frequency (TF) analysis is a more powerful tool for characterising PD signal [9][10][11][12]. The shorttime Fourier transform (STFT) [9], wavelet transform (WT) [10], and S-transform (ST) which uniquely combines frequencydependent resolution with absolutely reference phase [11,12], have been employed to extract the PD feature parameters. However, these features are applied to the recognition of PD source in power transformers or cables only. Due to the coaxial structure, the propagation properties of electromagnetic (EM) waves in GIS are distinctive. For instance, for the transverse EM (TEM) mode, the cutoff frequency f c does not exist, whereas the high-order modes, including transverse electric (TE) mode and transverse magnetic (TM) mode, have a corresponding f c [13]. In addition, when the UHF signals passed through the insulation spacer, the attenuation was mainly due to the superimposition of reduced TE mode and TM mode [13], and in the case of the disconnecting part, the TEM mode component was reflected, whereas the higher frequency component over the f c of TE 11 could propagate [14]. Furthermore, it was found that the TEM mode component became the main component after the UHF signals passed through the L-shaped branch [15]. Therefore, the transmission characteristics of EM modes have a significant influence on the detected UHF signals. Nevertheless, the existing methods of extracting the features of UHF PD signals in GIS [16][17][18][19] do not take into account the impact of EM mode on the recognition accuracy of the PD source. In our work, the method that using the EM mode divides the TF plane is investigated to improve the accuracy.
Various classification techniques, like k-means clustering [6], fuzzy c-means clustering [20], probabilistic neural network [21], support vector machine (SVM) [22], and k-nearest neighbour (KNN) [23], have been applied to the PD source identification of high-voltage equipment. Nevertheless, both the training and/or testing phases of these classifiers are time-consuming. Recently, a novel learning algorithm for single-hidden layer feedforward neural network (SLFN), the so-called extreme learning machine (ELM) [24], is considered to be able to efficiently process large data set and have excellent classification capability. However, in [24], the input weights and biases are randomly generated. Since these critical parameters have a great influence on the performance of ELM, they should be tuned.
The novelty and contribution of this paper are elaborated as follows: (i) According to the cutoff frequencies of TE m1 modes in the range 0-2 GHz, the TF plane is divided into five regions. The three loworder moments of every subregion are extracted as the original feature space. For the ELM, SVM and KNN classifiers, the recognition accuracies based on the selected six moments through J criterion are raised up by 26.5, 29 and 37%, respectively, compared to those based on the three low-order moments of the whole greyscale image. It demonstrates that our selected feature parameters significantly improve the recognition accuracy of PD source in GIS. Thus, the image segmentation method based on the cutoff frequencies of EM modes is very effective due to considering the special coaxial structure of GIS.
(ii) In our work, the input weights and biases are tuned by particle swarm optimisation (PSO) technique rather than generated randomly. When the ELM is applied to GIS PD pattern recognition, it provides the comparative generalisation performance at the fastest testing speeds compared to SVM and KNN, thereby more appropriate for the real-time PD online monitoring. Moreover, the PSO-ELM algorithm is first elaborately applied to GIS PD source identification. This paper is organised as follows: Section 2 mainly deals with the UHF PD measurement system. Then, the procedures of signal preprocessing and feature extraction are elaborated in Section 3, followed by the brief introduction of ELM optimised by PSO, as presented in Section 4. Furthermore, based on the moment features, the performance of ELM is investigated and compared with that derived from SVM and KNN classifiers in Section 5. Finally, Section 6 draws the conclusions. Fig. 1 depicts four types of artificial insulation defects. The metallic protrusion defect consists of a pair of needle-to-plane electrode; the floating metal defect is realised by suspending an insulation ring to fasten a floating brass, the HV electrode and floating brass use tapered tips in order to discharge easily and the plane electrode employs rounded corners to avoid corona discharge; to imitate the free metallic particle defect, fifteen aluminum foils with various diameters are put in a bowl-shaped electrode; the void defect is composed of three layers of epoxy resin and the middle one has a small hole. Although the designed artificial defect models are not exactly the same as the real insulation defects in practice, their PD mechanisms [25][26][27] are identical. Fig. 2 presents the experimental setup of UHF PD detection. The artificial insulation defect is placed in a perspex glass container to generate the PD signal. The PD source comprised of insulation defect and the container is located at the left end of GIS. The central conductor diameter and enclosure diameter of 220 kV GIS model are 90 and 320 mm, respectively. A disc coupler is used as an UHF antenna. Due to the limitation of measuring range of gigahertz TEM cell [28], its effective height for characterising the sensitivity to the incident electric field is measured only in the range of 0.2-2 GHz and the results are plotted in Fig. 3. A Keysight DSO9404A digital oscilloscope, whose analogue bandwidth is 4 GHz and the sampling rate is set to 10 GSa/s, is utilised to acquire the PD signals. To ensure the diversity of samples from the same defect, two experimental parameters are controlled. One is the test voltage. In our experiments, for each class of defect, three test voltages are applied, and at each test voltage, 100 samples are acquired. For example, the three test voltages of the void defect are 18, 19 and 20 kV, respectively. Changing the test voltage results in the different discharge quantity, which reflects the severity of discharge. The other is the duration of the applied voltage. The PD waveforms, such as the signal amplitude, slightly vary with the lasting time of applied voltage. In the experiments, 10 PD pulses are continuously acquired at intervals of ten minutes. Thus, these samples are representative. The PD inception voltage (PDIV), test voltage and the number of samples are given in Table 1. The PDIV mainly depends on the structure and dimension of defect models. Fig. 4 shows the typical UHF signals after wavelet-based denoising [29] and Fig. 5 gives the corresponding normalised frequency spectrums. For the protrusion defect, the energy of UHF signals is mostly distributed in the range of 0.4-1.9 GHz; for the floating metal defect, the energy is within 0.3-1.4 GHz; for the particle defect, the energy is within 0.5-1.8 GHz; for the void defect, the energy is within 0.5-1.3 GHz and the lasting time is the shortest.

Signal preprocessing and feature extraction
In this work, the PD source identification is split into seven steps, and the flowchart is shown in Fig. 6.

S-transform
In ST, the widths of time windows and frequency exhibit contrary change characteristics, resulting in a higher frequency resolution at low frequencies and a higher temporal resolution at high frequencies. Thus, ST integrates the merits of STFT and WT [11,12]. The ST of signal x(t) is defined as (1) where f represents the frequency of the UHF signal and is a parameter that controls the location in the timeline. Through ST, modulo operation and min-max normalisation, the TF matrix is projected into a greyscale image, as shown in Fig. 7. The deeper the red colour, the larger the magnitude of instantaneous frequency of the UHF signal. On the contrary, the deeper the blue colour, the smaller the magnitude.

TF plane division
GIS, which consists of a central conductor and an enclosure, is modelled as a cylindrical coaxial waveguide. Theoretically, the cutoff frequency of the TEM mode, f c , is 0 Hz and all frequencies of the TEM mode from direct current upwards can exist in GIS. However, for the high-order modes, only the frequencies above f c of corresponding modes can propagate. The formula to calculate f c of TE mode can be found in [30], where k = 2π f c /c is the cutoff wave number; a and b are the radii of central conductor and enclosure, respectively; G n ′ and Y n ′ are the  first derivatives of the nth-order Bessel function of the first and second kind, respectively. Table 2 lists the cutoff frequencies of TE m1 (m = 1, 2, 3, 4 or 5) modes <2 GHz. The higher the order of TE mode, the greater the cutoff frequency. In addition, TEM mode component in GIS propagates at the light speed, but the high-order modes display the velocity dispersion characteristics and their velocities depend on the frequencies. The propagation velocity of the TE mode is given by where c is the light speed in the free space. Fig. 8 presents the velocities of TEM and TE m1 modes. It is observed that all frequency components of TEM mode can propagate, whereas each TE m1 mode has a corresponding cutoff frequency, which is consistent with Table 2. Moreover, the higher the order of TE m1 mode is, the slower is the propagation velocity at the same frequency.
Considering the different propagation velocities and attenuation characteristics of EM modes in GIS, the TF plane is divided into five rectangular regions according to the cutoff frequency of TE m1 mode. The split lines of these regions along the vertical axis are at frequencies 0 Hz, 479, 1249, 1586 MHz, and 2 GHz, respectively, as illustrated in Fig. 9.

Feature extraction
In our work, the grey distribution of the image is characterised by its three low-order moments, i.e. the mean, the standard deviation and the third root of skewness. The extracted moments from the whole image are lack of spatial information. Therefore, the greyscale image is first divided into five regions according to the EM modes, and then the three low-order moments of every subregion are computed as the original feature space.
where u i is the mean of the ith region, σ i is the standard deviation of the ith region, s i is the third root of skewness of the ith region, N i is the total number of pixels in the ith region and p i, j is the grey level of the jth pixel in the ith region. The mean can represent the average grey level of the image. The larger the mean, the greater the energy of UHF PD signals. The standard deviation can measure how far a set of the pixel value is spread out from their mean. The greater the standard deviation is, the more dispersed the data is  around the mean. The skewness can describe the symmetrical degree of pixel value around their mean. The larger the skewness, the less symmetric the distribution of data. This is the reason why three low-order moments are chosen as features.

Feature selection
In order to address the issue caused by dimensionality, the J criterion is utilised to reduce the dimensionality of the original feature space [31]. The J-value is equal to the ratio of the betweenclass scatter value S b to the within-class scatter value S w . For Lclass problems, the J-value is calculated as follows where σ c is the number of samples belonging to class c, σ is the total number of samples, m c is the mean of selected feature for class c, m 0 is the mean of selected feature for all samples, and c is the standard deviation of selected feature for class c. The J-values of all the 15 moment features are shown in Fig. 10. A greater Jvalue denotes that the feature parameter has good capability to separate the different classes. Applying this criterion, six features with the J-value >0.2, namely, 2 , 4 , 5 , s 2 , s 4 and s 5 , are chosen as the input of classifiers and they are plotted in Fig. 11.

Principle of ELM
ELM is a highly efficient learning algorithm [24]. Given σ observations , y i2 , …, y il T ∈ R l , the output of SLFN with M neurons in the middle layer is expressed as which can also be written in the form of the matrix Hβ = Y, with where g x is the activation function, w i = w i1 , w i2 , …, w in T is the input weight vector linking the ith hidden node and the input nodes, b i is the bias of the ith hidden node, β i = β i1 , β i2 , …, β il T is the output weight vector linking the ith hidden node and the output nodes, and H is called the hidden-layer output matrix. In the ELM  algorithm, the w i and b i are randomly generated, and the output weight matrix is estimated as β = HỸ (9) where H is the Moore-Penrose generalised inverse of H.

Parameter optimisation of ELM
To obtain better generalisation performance, the parameters of the hidden layer of ELM need optimisation. As an evolutionary algorithm, the PSO possesses a strong global search ability to avoid local optimum. In our work, the PSO is utilised to obtain the optimal input weights w = w 11 , w 12 , …, w 1n , w 21 , …, w 2n , …, w Mn −1, 1 and input biases b = [b 1 , b 2 , …, b M ] 0, 1 . Each particle w, b is a potential solution in (n × M + M)-dimensional space. At each iteration, it is manipulated using [32] where h is the current number of iterations; both V h k and X h k R n × M + M , and they refer to the velocity and position of particle k at the hth epoch, respectively; c 1 and c 2 represent the acceleration consts; r 1 and r 2 denote two random numbers within (0, 1); pBest k is the best position of particle k; gBest is the global best position of swarm; e is the current inertia weight; e max and e min are the initial and final inertia weights, respectively; h max is the maximum number of iterations. In addition, the velocity and position are limited within a certain range, i.e. V h + 1 k V min , V max and X h + 1 k (X min , X max ). Additionally, for each particle, the root means standard error (RMSE) [24] is calculated as its fitness value where y i is the expected output of ELM and l is the length of the vector y i . The pseudocode about the optimisation procedure is given in Fig. 12.

Results and discussion
The simulations are carried out in MATLAB 2016b environment running in an Intel core i5-6500 CPU with clock speed 3.2 GHz. Each class of measured PD signals is randomly divided into two groups, one of which contains 200 samples as training data set and the other contains 100 samples as testing dataset. Hence, the numbers of total training and testing samples are 800 and 400, respectively. Fig. 13 plots the number of hidden nodes versus training accuracy under different activation functions. To avoid overfitting, the optimal number of hidden neurons is first estimated according to the empirical formula in [33]. The training accuracy is raised up with increasing the number of hidden nodes, but when the number of hidden nodes is >16, the training accuracy grows at a snail's pace on matter the type of activation functions. Moreover, the 'sig' function performs the best, whereas the 'hardlim' function does the worst. Ultimately, the 'sig' function is selected and the number of hidden nodes is set to 16. The key parameters of the two algorithms are listed in Table 3. The fitness in the optimisation process is illustrated in Fig. 14. The part of the output results of ELM is listed in Table 4. For the second case, the correct class should be protrusion defect, whereas it is misclassified as a particle defect. Similarly, for the sixth case, the correct class should be particle defect, whereas it is misclassified as a floating metal defect. Accordingly, the classification results from ELM are shown in Table 5, where six samples are misclassified as protrusion defect, 1 sample is misclassified as floating defect and four samples are misclassified as gap defect in terms of particle defect. It can be seen that the main classification errors are concentrated on the type of particle defect. This is attributed to the great dispersity of UHF signals induced by the free metallic particles. For the ELM classifier, the overall accuracy is 95%.
The multiclass SVM classifier model [34] is established to directly achieve multiclass classification by using LIBSVM 3.22 toolbox. Its kernel function is set to radial basis function, and the  penalty factor C and kernel parameter are selected with ten-fold cross-validation and grid-search method in the range from 2 −8 to 2 8 . From Fig. 15, the highest training accuracy can reach 98.75% when C = 2 4 and = 2 6 . Regarding KNN, its highest testing accuracy can reach 88.5% when the type of distance is set to Euclidean distance and the number of nearest neighbours is equal to 5 through considerable trials. Table 6 lists the centroid of training dataset for six moment features. After building the SVM and KNN classifier models, the remaining samples are tested. Table 7 summarises the overall accuracy and time cost of three classifiers. It can be observed that although the training accuracy of SVM is slightly superior to the ELM, the ELM performs the best in the testing accuracy. Since the K-fold cross-validation initially randomly divides the data into K subsets, there is still a possibility of a non-representative data split being used for training and validation [35], resulting in that the testing accuracy of SVM is 6.75% lower than the training accuracy. From the perspective of time cost, the ELM only requires 5.25% of the training time taken by the SVM, and the testing speed of ELM is 21.33 and 9.67 times faster than those of SVM and KNN, respectively. Tables 8 and 9 list the precision and recall, respectively. For each class of defect, three classifiers have high precision and recall at the same time. It demonstrates that the overfitting does not occur. In addition, Table 10 lists the computational complexity. Since the KNN is a type of lazy learning [37], no explicit training step is required, and for the training phase, both the time cost and computational complexity are 0. The parameters in Table 10 are illustrated as follows: D i = 6 (the dimension of input features), Thereby, for the training and testing phases of ELM, the complexities are τ(32,000) and τ(160), respectively; for the testing phase of KNN, the complexity is τ(1200). Besides, the random allocation of training and testing samples leads to the varied number of support vectors at each run [36], and the time in         Fig. 16 plots the three low-order moments of the whole greyscale image, based on which, the training and testing accuracies are tabulated in Table 11. For the ELM and SVM, the training accuracies are declined by 26.25% and 25.25%, respectively, compared with those based on the selected six moments; for the three classifiers, the testing accuracies are dropped by 26.5, 29 and 37%, respectively. It indicates that the selected six moments are far more effective for recognising the PD type.
In the stage of feature selection, in order to determine the appropriate thresholding, the 15 features in Fig. 10 are reordered according to the descending order of J-value and then listed in Table 12; subsequently, the influences of the number of features on the training accuracy of ELM are investigated and the results are plotted in Fig. 17. As the number of features increases, the accuracy is raised up at the beginning and then declined. The critical number is equal to 6. Here, the J-values of the sixth feature, i.e. 2 , and the seventh feature, i.e. u 4 , are 0.347 and 0.198, respectively. Therefore, the thresholding is set to 0.2, and when the J criterion is applied in industrial practices, all the features with Jvalue greater than the thresholding can be chosen as the input of classifier.

Conclusions
This paper presents a novel feature extraction method of considering the EM modes to analyse the UHF PD signals. The conclusions are drawn as follows: (i) Considering the propagation properties of EM waves in GIS, the TF plane obtained through ST is divided into five regions according to the cutoff frequencies of TE m1 modes. The mean, the standard deviation and the third root of skewness of every subregion are computed as the original feature parameters, along with the dimensionality reduction by using the J criterion. The high recognition accuracies of ELM, SVM and KNN demonstrate the effectiveness of the selected moments.
(ii) The recognition accuracies of three classifiers based on the selected moments are raised up by 26.5, 29 and 37%, respectively, compared with those based on the three low-order moments of the whole grey image. It indicates that the method of using EM modes to divide the TF plane significantly improves accuracy. (iii) The training times of ELM, SVM and KNN are 16, 305 and 0 ms, respectively. Correspondingly, the testing times are 3, 67, 32 ms, respectively. Thus, compared to SVM and KNN, the ELM possesses the fastest testing speed along with satisfactory learning speed, and it is more suitable for the real-time detection of PD.
In view of the difference between the real PD environment in practice and our experimental condition, in the future, further studies will focus on obtaining more representative UHF signals to verify the robustness of the selected moment features after considering EM modes, by means of monitoring the on-site PD activities or changing the relative position between PD source and UHF sensor, such as the angle in the circumferential direction and the distance in the axial direction.

Acknowledgments
This work is supported by the National Natural Science Foundation of China (No. 51677061, 51507058).