Power quality data processing method based on a distributed compressed sensing and learning dictionary

: For the characteristics of a random distribution and a large number of buses in the power system, the authors introduce distributed compressed sensing to compress and reconstruct the power quality data. They built a distributed IEEE14 bus system in PSCAD. This model was used to analyse the correlation and sparsity of power quality data and to obtain the four types of data that should be used in the latter simulation, the power quality data used in this study are the voltage amplitude of each bus in a power system. To make the signal sparser, they constructed a distributed compressed sensing learning dictionary for power quality data. The simulation results show that the performance of the distributed compressed sensing learning dictionary constructed in this study is more suitable for power quality data. The application of distributed compressed sensing in a power system can ensure the accuracy of reconstructed data when the quantity of data is reduced by 1/3, which greatly reduces the system storage space. Additionally, the speed of reconstruction also increases by 3/5.


Introduction
As one of the most indispensable forms of energy for human survival and development, electrical energy has penetrated into all aspects of human life and production. It also plays an irreplaceable role in scientific and technological progress and social development [1]. With some electrical equipment, such as nonlinear loads, lighting control systems, computers, data processing equipment, and factory rectifiers, the power network has been seriously polluted.
Power quality problems are mainly caused by high-power equipment and various line faults. Among these problems, harmonics, interharmonics, flicker, and other steady-state disturbances have periodicity, and transient disturbances such as swell, sag, interruption, and damped oscillation have short-term and randomness [2]. Power quality problems cause some user equipment to malfunction and result in unstable operation of the power network [3]. To improve power quality, the power department needs to monitor various power quality information in real-time and analyse power quality effectively [4]. Massive power quality data need to be collected, compressed, stored, transmitted, detected, and identified [5]. Therefore, many scholars have proposed some data processing methods with high compression performance to achieve good compression results, but they are all based on the Nyquist sampling theorem, which results in sampling redundancy and wasting storage space [6,7].
The theory of distributed compressed sensing (DCS) [8] is based on compressed sensing (CS) [9,10]. CS indicates that as long as the signal is compressible or sparse in a transform domain, then the high-dimensional signal can be projected onto a lowdimensional space by using an observation matrix that is not related to the transform basis. Finally, the original signals can be accurately reconstructed from these small projections by solving an optimisation problem. DCS indicates that if there is a large correlation between signals, then it can reconstruct signals accurately using less data than CS. Many people have applied CS to process power quality data [11], but the application of DCS to power quality data is still in the beginning stage. Compared with the traditional CS method, the DCS method has a higher compression ratio and requires less reconstruction time. This is a more efficient and reliable method for processing power quality data. DCS not only reduces the computational complexity but also improves compression performance. Most importantly, it can process multiple signals simultaneously [12], and this characteristic is ideal for processing power quality data.
In the application of DCS, the sparse representation of signals is the key. People commonly use the fast Fourier transform (FFT) basis [13], the discrete cosine transform basis [14] or the wavelet transform basis [15] for sparse transformation. However, these methods have spectrum leakage problems. If we use only one sparse basis to sparsify power quality data, it cannot effectively reflect the characteristics of power quality data and will reduce the sparsity of the original data. The signal sparsity is directly related to the accuracy and quality of reconstructed signals [16,17], so we construct a DCS learning dictionary to sparsify the power quality signals in this paper. Our method can analyse power quality data more rapid and accurate, and it is very important to ensure the stable operation of power systems.

DCS theory
CS theory is the basis of DCS. CS theory shows that if a signal x ∈ R m × 1 is sparse on a sparse basis or dictionary, it can recover y = Φψθ from the measured value y ∈ R m × 1 , and Φ ∈ R m × n is a measurement matrix. Sparse vector θ is the solution to the l 0 norm minimisation problem If we want to find the optimal solution to formula (1), we need to list all possible linear combinations of the locations of all non-zero entries in θ. Therefore, the numerical calculation of formula (1) is very unstable and is an non-deterministic polynomial-hard problem. However, that does not mean there is no solution. We can turn formula (1) into a simpler L1 optimisation problem. The same solution can be found under certain conditions. As shown in formula (2) Formula (2) transforms formula (1) into a convex optimisation problem, so it is equivalent to solving a linear programming problem.
While CS theory only processes one signal, many scenarios in the real world involve multiple signals. In this case, CS can be J. Eng separately applied to each signal. This solution is simple and effective when there is no correlation between the signals, but if the set of signals is interrelated, the solution is less efficient. In multisensor distributed systems, CS should consider the correlation characteristics between signals. To solve this problem, the theory of DCS is proposed. Baron et al. proposed a joint sparsity model (JSM) for three different scenarios: JSM-1 (sparse common component and sparse innovation component), JSM-2 (common sparse support), and JSM-3 (non-sparse common component and sparse innovation component). DCS is an extension of the traditional CS theory for processing sets of related signals. If the signal set shows a strong correlation, then it will provide a better compression ratio. In this study, we apply JSM-1 to the proposed method. The basic framework of JSM-1 is as follows: Assume j signals x j , j ∈ {1, 2, . . . , J}, x j ∈ R N ; a basis ψ can sparsify the representation of all signals. It is also assumed that all signals have a common sparse component z plus an innovation sparse component z j . These signals can be written as Their components can be written as K represents the sparsity of the common component z, and K j represents the sparsity of each signal innovation component z j .
To simultaneously recover a set of signals using a linear programme, we also need to define the following matrix: Calculate the compression ratio using measurement matrix Φ , calculate the sparse coefficient vector θ using Φ and ψ by the following formula: θ^= arg min ∥ θ ∥ 1 s . t . y = Φ ψ θ (9) θ contains K + ∑ j = 1 J K j non-zero elements. The original signal can be obtained by If the measured signals have a strong correlation, this method can perform more compression. In fact, in this method, DCS is achieved by superimposing measurements to form a vector and running a linear programme. This makes the method easy to implement but increases the computational cost. The computational complexity of linear programmes is cubic, and as the size of the vector increases, the computational cost increases.

Correlation of power quality data
The DCS theory enables the newly distributed coding algorithm to simultaneously utilise the correlation structure inside the signal and between different signals. In a typical DCS scenario, the signals measured by many sensors are individually sparse, and the joint sparsity model can be used between different sensors. Each sensor encodes its signal by projection. It is transferred to another noncoherent basis and then a few of all the resulting coefficients are sent to a single collection point. DCS does not require collaboration between sensors during signal acquisition. We can use all the measured data and the correlation between the signals to recover all the original signals. Next, we simulate the model of the distributed power system in PSCAD to analyse the correlation and sparsity of power quality data. Fig. 1 shows the structure of an IEEE14 bus system built in PSCAD. It is a distribution electric network model. Numbers 1-14 represent 14 buses. The system voltage is 23 kV, the frequency is 60 Hz, and the rated capacity is 100 MVA. First, we analyse the cross-correlation of the power quality data. To make the results more persuasive, harmonic data are selected in this study. Based on the circuit in Fig. 1, each load in the IEEE14 bus system is connected in parallel with a harmonic source with a frequency of 180 Hz and operated on the system. The sampling frequency is 7200 Hz, the sampling time is 10 cycles, and every bus collects 1200 continuous voltage amplitude data. Since the voltage of each bus has three different phases, we need to collect 1200 data points every phase, i.e. every bus collects a total of 3600 data points. To verify whether every bus has a good correlation with the other 13 buses, we use MATLAB to calculate the average value of cross-correlation coefficients of different phases of each bus, and the calculation result indicates the degree of crosscorrelation between each bus and the other buses.
It can be seen from Fig. 1 and Tables 1-3 that the value of correlations between different buses decreases as the electrical distance between the buses increases. This cross-correlation is called spatial electrical distance cross-correlation. From Table 4, we know that in the above IEEE 14-bus power system, the overall cross-correlation of bus 4 and bus 5 are the best, and the crosscorrelation of bus 7 and bus 12 are also strong, so the four-way power quality signal used in this study can be obtained by buses 4, 5, 7, and 12. This can recover the original signals more quickly. The above four buses have a good correlation with all buses, which is beneficial for judging the status of the entire power system from their trend.
In summary, the data of each bus are correlated because the load in the power system and the renewable distributed power are continuously changing over time, so they have self-correlation. Additionally, these properties affect other generators and loads in the same area. The electrical distance between buses is also an important factor for generating the correlation of spatial electrical distance. Therefore, it can be seen that the power quality data are correlated.

Sparsity of power quality data
We have proven that there is a correlation between the buses of the power system, so we need to analyse the sparsity of the bus signal. Since the MATLAB simulation tool generates regular waveforms, there are various noises and interferences in the actual power system. Therefore, we choose to use the model in Fig. 1 to obtain power quality data. We simulated the status of short-circuit, capacitor switching, and added a harmonic source to the IEEE14 bus system and then obtained four different types of power quality data, including voltage sag, voltage swell, voltage harmonics, and voltage gap. They are simulated at the four buses with the best correlations of 4, 5, 7, and 12. The fundamental frequency is 60 Hz, sampling rate is 15,000 Hz, sampling time takes six cycles, so every signal has 1500 sampling points. We draw the four different waveforms in Fig. 2.   The sparsity of the signal is a prerequisite for using DCS, so the sparsity of the power quality signal is very important. Fig. 2 shows that the power quality signal is a one-dimensional digital time signal and has no sparsity characteristics in the time domain. The sparsity of power quality signals needs to be analysed in the frequency domain. In the field of signal processing, the Fourier transform reveals the characteristics of the signal in the frequency domain. We transform the time domain signal into a frequencydomain signal by the Fourier transform. MATLAB is used to perform a FFT on the obtained four signals. For convenience, this study only analyses four typical power quality signals: voltage sag, voltage swell, voltage harmonics, and voltage gap. Each signal has 1500 sampling points, and the sampling points are arranged in a time series.
As seen in Fig. 3, there are only a small number of frequency domain components for the four power quality data. Almost all close to zero in the high-frequency component. Therefore, it can be   considered that the power quality signal is sparse. It is feasible to compress and reconstruct the power quality signal by using DCS.

DCS learning dictionary
In conventional power quality data compression and reconstruction methods, the sparse transform basis generally uses FFT, discrete cosine transform basis or discrete wavelet transform basis to sparsify the power quality data. The problem of matching power quality data with a sparse transform basis cannot be fully considered. There may be a problem that the reconstructed signal error is largely due to less sparsity of the power quality data under the fixed sparse transform basis. If the power quality signal is sparsely represented only by some orthogonal transform basis, it cannot reflect the characteristics of the power quality signal effectively. Finally, this causes a result in which the sparse representation is not optimal. Therefore, based on the correlation of power quality signals and the characteristics of DCS, this study proposes a new construction method of DCS learning dictionary. In the first joint sparsity model of DCS, the signal is divided into a common component and an innovation component, the same component that all signals have is called the common component, and the difference component between each signal and the remaining signals are called the innovation component. For the power quality signal, the simulation verification in the IEEE14 bus system proves that it has a high correlation. It can be seen from the formula of various types of power quality data and the waveform figures drawn by the extracted power quality data that all power quality data contain fundamental waves. Therefore, for the power quality signal, the fundamental wave when the voltage is normal represents the common component, the difference between the other seven kinds of power quality signals and the fundamental wave called the innovation component.
For DCS, although multiple signals are processed at the same time, the common component only needs to perform one compression and reconstruction because all signals contain the same common component. After the common component is compressed and reconstructed, the innovation component of each signal continues to be reconstructed, and then the original signal can be reconstructed accurately. This method greatly reduces the number of measured values. The accuracy of the reconstruction result of the common component is very important because each signal needs to use the reconstruction result of this component, so the reconstruction result of this component will affect all signals. To reconstruct the original signal more accurately, we need to make the common component as sparse as possible to reduce the reconstruction error of the common component. Therefore, this method divides the DCS learning dictionary into two parts. The first part uses the fundamental wave data as a training sample to construct a common dictionary, and the second part uses the innovation part as a training sample to construct an innovation dictionary. Finally, the two sub-dictionaries are cascaded to form a DCS learning dictionary.
The construction of the DCS learning dictionary is based on the k-singular value decomposition (KSVD) algorithm. KSVD is a greedy algorithm that essentially alternates optimising between the dictionary and sparse coefficient matrices [18][19][20]. The construction method is mainly divided into two steps. The first step is sparse coding, and we need to fix the dictionary D. Then, the greedy iteration algorithms or convex optimisation algorithms are used to solve the objective function. The local optimal sparse coefficient matrix T of the signal f on the dictionary D is obtained. The second step is updating each basis atom of the dictionary. The matrix T obtained from the previous step is fixed. The dictionary needs to be updated iteratively according to the initial set number of iterations and error requirements until the optimal dictionary D is found.

DCS learning dictionary construction steps
First, the simulation tool is used to establish the power quality signal training sample set model E ∈ R M 1 × W 1 and G ∈ R M 2 × W 2 , E is a common sample set, G is an innovation sample set, W is the number of training sample types, M is the number of sampling points in each type of training sample. The common sample set contains only the fundamental wave data. The innovation sample set contains an innovation component of seven different types of signals, including voltage harmonic, voltage oscillation, voltage swell, voltage sag, voltage interruptions, voltage gap and voltage spike. Expressed as In the above formula, e i j ∈ R M × 1 is the jth sample of the ith class in the common sample set, g i j ∈ R M × 1 is the jth sample of the ith class in the innovation sample set, i = 1, 2, …, k, j = 1, 2, …, n, M is the sample dimension, R is a collection of real numbers. Sample data used in this study are collected from the IEEE14 bus system, as shown in Fig. 1 in the PSCAD. The common sample set has 200 samples, which are obtained from random buses. The innovation sample set has 300 samples, for a total of seven types. They are randomly selected from the buses with higher correlation. So W 1 = 200, W 2 = 300. In the IEEE14 bus system, the rated voltage is 23 kV, the frequency is 60 Hz, the sampling frequency is 6000 Hz, and the sampling time is two cycles. Thus, the sampling point M of each sample is 100. The sample set E consists of 20,000 fundamental wave data, and the sample set G consists of 30,000 data of seven types.
Then, the dictionary is initialised separately for the common dictionary and innovation dictionary. Take the innovation dictionary as an example. We select Q training samples from innovation sample set G to initialise dictionary D t0 ∈ R M × Q . The number of basis atoms in the dictionary is Q. To simplify the processing of the subsequent data and ensure the convergence of the programme run time. We need to perform a two-norm normalisation calculation ∥ D t0 j ∥ 2 = 1 for each column in D t0 , j = 1, 2, …, Q. The optimisation objective function of the initialisation dictionary is In the above formula, T 0 is the sparse representation matrix obtained by the objective function optimisation solution, λ is the regularisation parameter, which is used for balance between reconstruction error and sparsity. The initial value of the iterations is L = 1. The total number of iterations L and the tolerance error J S can select according to the basis atomic features of the initialisation dictionary and experimental simulation. Finally, the KSVD algorithm is used to optimise the objective function. It is divided into two steps: the first step is fixed the innovation dictionary D ti obtained through the ith iteration. At this time, the objective function is simplified to Solving a sparse representation matrix T i based on a simplified objective function is equivalent to solving a single variable optimisation function. The solution process of T i is an ordinary sparse solution problem. The approximate sparse solution of the objective function can be obtained using any greedy tracking algorithm. The second step is to fix the sparse representation matrix T i after the last iteration. Each basis atom in the innovation dictionary D ti is separately optimised. At this time, the objective function can be simplified to make the following update: In the above formula, k = 1, 2, …, N, G k is the error term. Perform singular value decomposition on G k . Updating the basis atom d k is used to solve the eigenvector corresponding to the largest eigenvalue. The least square method can be used to solve the problem. Obtain the optimal innovation dictionary D t and the optimal common dictionary D g by this method. Cascades D g and D t are used to obtain DCS learning dictionary D.

Main parameters of the DCS learning dictionary
The main parameters of the DCS learning dictionary are the number W 1 , W 2 of power quality data training samples, the length S 1 , S 2 of atoms in the dictionary, the number Q 1 , Q 2 of atoms, and the number L 1 , L 2 of iterations during training. There are 20,000 data in the training sample G, and 30,000 data in the training sample E. The length of atom S is related to the compression ratio. Temporarily set S 1 = S 2 = 100. When the compression ratio changes, keep the other parameters unchanged, change the atom length and obtain the dictionary in the same way. When selecting the number of common dictionary atoms and innovation dictionary atoms, test the common dictionary with fundamental wave signals and test the innovation dictionary with the innovation component of seven different types of power quality signals. Repeat the same experiment 30 times for each dictionary and calculate the average value of the reconstruction error of 30 experiments. The reconstructing error is represented by the mean square error (MSE). Draw a diagram in which the reconstruction error changes with the number of atoms. As shown in Fig. 4, the reconstruction error decreases as the number of atoms increases. When the number of common dictionary atoms is ∼180, and the number of innovation dictionary atoms is ∼260, the common dictionary and innovation dictionary reconstruction errors are minimal and gradually stabilised. We should consider both the accuracy of the reconstruction and the complexity of the calculation. After comprehensive consideration, we select the number of atoms as Q 1 = 180 and Q 2 = 260 MSE = 1 In the above formula, MSE is the mean square error, N is the number of data, x j is the original data, and x^j is the reconstructed value. Finally, we select the number L of iterations. We randomly select 100 samples from the common component sample set and the innovative component sample set separately to initialise their dictionary. Initialise the objective function J. The tolerance error is J s = 0.01, the maximum number of iterations of the common dictionary is set to 8, and the innovation dictionary is set to 10. The simulation training diagram is shown in Fig. 5. In the common dictionary training curve, the objective function error curve gradually stabilises as the number of iterations increases. When the number of iterations reaches four times, the difference of the objective function value between the third time and fourth time is less than the tolerance error. Thus, training stops and a common dictionary D g is obtained. The number of iterations in the common dictionary is L 1 = 4. In the innovation dictionary training curve, when the number of iterations reaches six times, the difference of the objective function value between the fifth time and sixth time is less than the tolerance error, training stops and innovation dictionary D t is obtained. The number of iterations in the innovation dictionary is L 2 = 6.

Simulation results
In DCS, measurement matrix Φ needs to satisfy constraint equidistance, it is very difficult to directly construct a measurement matrix and make it satisfy the constraint equidistance. It has been proven in the theoretical world that the random Gaussian matrix Φ has a very useful characteristic: for an M × N random matrix Φ, if its elements obey independent Gaussian distribution, when M ≥ cKlog(N /K), constraint equidistance can be satisfied with a high probability (c is a small constant, K is the sparsity of the signal). Therefore, all scenes in this study use a random Gaussian matrix as the measurement matrix. The IEEE14 bus system constructed in PSCAD is used to obtain four power quality signals, the sampling frequency is 15,000 Hz, the sampling time is six cycles, and each bus has 1500 sampling points. The four signals are all complete signals. Then, the four signals are compressed and reconstructed by different methods on the MATLAB simulation platform. The reconstruction accuracy and reconstruction time of different methods are compared to verify the superiority of DCS.
Method 1 uses a DCS and DCS learning dictionary. The orthogonal matching pursuit (OMP) algorithm as the reconstruction algorithm. Fig. 6 compares the original data and reconstructed data.
Method 2 uses a DCS. FFT as the sparse basis, Gaussian random matrix as the observation matrix, and the OMP algorithm as the reconstruction algorithm. Fig. 7 compares the original data and reconstructed data.  can be seen that the reconstruction accuracy using the DCS learning dictionary is higher than FFT. Comparing Fig. 7 with Fig. 8, it can be clearly seen that the reconstruction accuracy of DCS is higher than that of CS. Then, we change the number of The original power quality signal length is 1500, and the sparsity of the obtained signals is between 35 and 40. The relationship between the sparsity and the number of measured values M is The number of measured values should be at least two times the sparsity, so the number of measured values starts from 150 and increases by 50 each time. The RMSE is calculated when M is 150, 200, 250, 300, 350, and 400 (Tables 5-7).
It can be seen from the tables that the reconstruction error decreases as the number of measured values M increases gradually. When the number of measured values in DCS is 1/3 less than CS, the error is still within the acceptable range. The data in the above table are used to draw (Fig. 9) for a more intuitive comparison. The relationship between the reconstruction error and the compression ratio of the power quality signal is shown. Fig. 9 shows the changing trend in RMSE for reconstruction data with the number of measured value changes by the above three methods. As shown in Fig. 9, the RMSE of Method 1 is superior to the RMSE of Method 2 and Method 3 when the three methods have the same number of measured values. Therefore, we know that the performance of DCS for power quality data is better than CS. The performance of the DCS learning dictionary is better than that of the FFT. When the number of measured values is small, the probability of successful reconstruction is considerable, although there is a certain error. When the compression ratio is >0.1, the power quality signal can be reconstructed more accurately. Therefore, it can reconstruct the original data accurately while the data amount is greatly reduced by the DCS learning dictionary constructed in this study. The reconstruction error is very small and reliable. We can use DCS to achieve efficient transmission, accurate reconstruction, and efficient storage of power quality data.
To further reflect the superiority of using DCS and DCS learning dictionary to process power quality data, we calculate the time of three methods for processing four signals, and the number of measured values is 150, 200, 250, 300, 350, and 400. Method 1 and Method 2 take the total running time, and Method 3 takes the summation of each signal processing time. The curve is shown in Fig. 10.
As shown in Fig. 10, if the same four signals are processed by DCS and CS when the number of measured values is the same, the processing speed of DCS greatly exceeds that of CS. The processing speed of the DCS learning dictionary is slightly faster than that of the FFT. Next, calculate the time of three methods for processing the different number of signals when the number of measured values is 200. The curve is shown in Fig. 11.
As seen in Fig. 11, in the case of the number of measured values being 200, the time of DCS and CS is almost equal only when the number of signals is 1. Once the number of signals is >1, the advantage of DCS in processing speed is particularly obvious. The larger the number of signals, the more significant the performance of DCS. From Figs. 10 and 11, we can see that the reconstruction time of DCS is about 2/5 of CS. In summary, for a power system with a large quantity of data, the application of DCS and DCS learning dictionary can improve reconstruction accuracy and greatly reduce processing time.

Conclusions
Massive power quality data compression in power systems is a major problem in the field of power systems. The traditional   method can no longer meet the requirements of information processing and analysis of the modern power network. How to compress the massive information brought by a large number of increased power system nodes and how to accurately reconstruct the compressed data is the main content of this study. This study mainly does the following improvements: (i) In view of the deficiencies of existing methods, DCS is introduced into the processing of power quality signals. It solves the problem that CS can only process one signal at a time.
(ii) We analyse the sparsity and correlation of power quality data and proves that it has sparsity and strong correlation.
(iii) Using the structure of DCS and the correlation of power quality data, the DCS learning dictionary is designed. It makes up for the lack of sparsity caused by a traditional sparse basis.
The simulation results prove the feasibility and meaningfulness of the method proposed in this study. It can better combine the structure and characteristics of power quality signals and it is beneficial to the actual power system to analyse the power quality data.