Fault diagnosis and identiﬁcation of malfunctioning protection devices in a power system via time series similarity matching

Alarm messages uploaded to a dispatch centre following the failure of a power system contain extensive temporal information. The accuracy and speed of fault diagnosis can be improved upon taking full advantage of such temporal information contained in these messages. From this standpoint, a power system fault diagnosis model based on time series similarity matching is proposed herein. First, a set of suspected faulty components can be determined after the occurrence of a fault or a set of faults. Then, on the basis of the speciﬁcations of protection devices, including protective relays and circuit breakers, and the deﬁned time series model, a set of alarm hypothesis time series is generated, containing action process information of the protection devices involved in these suspicious components. Meanwhile, the alarm messages received by the dispatch centre are pre-processed, and the alarm information time series are obtained. Subsequently, the fault component is obtained by calculating the similarity between each series in the alarm hypothesis time series set and the alarm information time series. Finally, the correctness and effectiveness of the proposed fault diagnosis model are demonstrated via a real-life scenario in a local power system in China.


INTRODUCTION
The objective of a power system fault diagnosis method is to identify a faulty element(s) and fault type(s), as well as to detect abnormal operation of protective relays and circuit breakers by employing alarm signals, sequence of event (SOE) information and so forth. The fault diagnosis method can assist system dispatchers to handle faults during an emergency, speed up the procedure of fault handling, prevent the spread of the fault area, and facilitate the recovery of the outage area in the affected power system. The issue concerning power system fault diagnosis has been extensively investigated over the past two decades, and several kinds of fault diagnosis methods have been proposed to address it, such as those based on expert systems (ESs), analytic models, artificial neural networks (ANNs), Petri nets, fuzzy sets, and rough sets. activates corresponding rules using received alarm signals as well as other information in order to identify faulty components through logic reasoning. This method has been applied in actual power systems [1]. The development and maintenance of a large knowledge base is not easy for a large-scale system with rapidly varying parameters, which is characteristic of an actual power system. Power system fault diagnosis methods that are based on analytical models [2][3][4][5][6] consider logical relationships among faulty components, protective relays, and circuit breakers, and they then formulate the fault diagnosis problem as a 0-1 integer programming problem, and seek the fault diagnosis result using an optimization algorithm. For fault diagnosis based on the analytical model paradigm, several practical methods have been proposed and implemented in actual power systems. For example, a method based on chance constrained programming is presented in [5] to address the uncertainties associated with alarm messages and the reliabilities of power system components; here, strong fault tolerance capability is achieved. In [6], an analytic integer linear programming model is established to identify the suspected fault sections and false alarms generated by directional fault indicators. In [7], a fault diagnosis model is reported for a distribution system with integrated distributed generation. Although the fault diagnosis method based on the analytical model has already been implemented in some actual power systems, some necessary improvements need to be made, especially with respect to the full utilization of temporal information present in alarm messages in order to improve the accuracy and speed of fault diagnosis. The performance of the method based on ANNs [8] largely depends on the completeness of training samples, which is not easily guaranteed. In addition, the state of an actual power system undergoes frequent and continuous changes, and the ANN therefore needs to be retrained frequently. These factors limit the applications of the method based on ANN in actual power systems. The fault diagnosis method based on the matrix [9] utilizes the collected fault current information to construct a fault judgment matrix for identifying the faulty component(s), but it may obtain inaccurate diagnosis results under incomplete and/or incorrect information, especially in the cases involving faulty communications systems. The method based on Petri nets [10][11][12] can identify faulty components rapidly by employing structured and directed graph models as well as matrix computation. Nevertheless, most existing Petri net-based fault diagnosis methods do not consider the temporal characteristics of alarm messages. The method based on rough sets [13] is similar to that based on ESs; however, it demonstrates a stronger descriptive capability regarding heuristic domain expert knowledge.
To enhance the accuracy and speed of power system fault diagnosis, it is necessary to maximize the use of temporal characteristics of alarm messages. To this end, several methods have been proposed. A method based on analytical models utilizing a temporal constraint network for power system diagnosis is presented in [4]. In other studies, a method based on a weighted fuzzy Petri net with time-delay constraints is developed [10,12]. However, the temporal information has not been fully utilized in these works, and additional improvements are required.
In various fields of science, engineering, economy, and society, there exist several well-known time series that are continuously generated [13][14]. Data mining technology can be used to explore valuable knowledge or rules from large amounts of data. The method based on time series is very promising in data mining research [15].
With the development of the dispatching automation system in a modern power system, more fault alarm messages can be obtained by the affected dispatch centre. The temporal information of alarm messages with a unified time scale can be obtained by sequence of event (SOE) records synchronized by the global positioning system (GPS). The accuracy and speed of fault diagnosis are expected to be improved via utilization of time-tagged information contained in SOE records.
In existing time series mining methods, time series similarity is a very important criterion, and determining the manner in which to discriminate and measure the similarity is an issue that needs to be addressed [16]. Similarity matching is an important method for examining the analysis of time series data. In [17], the incremental warping window is used to improve the accuracy of matching among time series.
As fault diagnosis is a prerequisite to accelerating the subsequent restoration stage in power systems [18], the accuracy of the fault diagnosis method needs to be further improved. Considering the reliability of the communication system in a modern power system, the fault diagnosis method based on time series similarity matching is employed for power system fault diagnosis. The contribution of this work is mainly reflected as follows: (i) The mathematical representation of a single alarm event with a time stamp is introduced. Depending on the configuration of each protection device, successive timing operation events of the protection device can be generated easily. (ii) A method for calculating the timing distance of power system alarms that can evaluate both missing/distorted alarms and false time stamps is proposed. This approach uses the dynamic time warping (DTW) distance in the field of data mining. (iii) The correlations between the time series in fault hypothesis alarms and the real alarms are converted to specific confidence levels to give dispatchers a more intuitive view of the credibility of various hypothetical failure causes.
The structure of the article can be summarized as follows. First, time series groups and similarity matching are described. Next, the temporal characteristics of alarm messages are described, and a time series model is examined. On this basis, the so-called power system time series distance is introduced to facilitate the presentation of measuring the similarity between the hypothetical and actual alarm series. Then, the time series distance between the hypothetical and actual alarm series is converted into the fault confidence levels of components to identify the faulty component(s) and fault type(s). Finally, some fault scenarios in an actual power system are employed to demonstrate the accuracy and efficiency of the proposed model for power system fault diagnosis.

Mathematical description of time series
A time series X is an ordered set of elements composed of a recorded value and the time node of a physical quantity. It can be described as where x i = (v i ,t i ) represents the recorded information v i of X at time t i ; n is the potential of X, that is, |X| = n. The angular sign of t is strictly increased with time (i.e. t i ≤ t j implies i ≤ j).
In the narrow definition of a time series, v i generally refers to a real value, while in the broad definition, v i can also refer to multimedia data, discrete symbols, customer model data, and other information.

Associated time series groups
If the first k elements in n time series are the same, the associated time series groups can be described as f is a flag variable; f = 1 indicates that the time series set {X 1 , X 2 , …, X n } is a concurrent one (i.e. the time series in G x occurs simultaneously with the occurrence of X s ); and f = 0 indicates that the time series set {X 1 , X 2 , …, X n } is mutually exclusive (i.e. there must be one and only one time series in G x to occur after the occurrence of X s ). It can be seen that an associated time series group is a set of three elements that can describe the correlation between two time series.

Calculation of time series similarity
The similarity between two time series can be measured using the distance [16], which could be applied in different ways according to different application scenarios. The smaller is the distance, the greater will be the similarity between these two time series. With respect to various applications, the distance can be defined in different ways.
There are many methods that can be used to measure the distance between time series, such as the Euclidean distance [19], the dynamic time warping distance [20][21], the longest common string [16], and the edit distance [22][23]. Among them, the edit distance is a measurement of the distance between the two string sequences. To compute the edit distance, the time series should first be transformed into strings via quantization and encoding. The edit distance represents the most compact editing step converted from one string to another one. Editing operations include insert, delete, and replace. The edit distance D i,j between the time series X = 〈x 1 , x 2 ,…, x n 〉 and the time series Y = 〈y 1 , y 2 ,…, y n 〉 can be calculated recursively from D 0,0 .
where L(x i , y j ) = 0 when x i = y j ; otherwise, L(x i , y j ) = 1. The edit distance can be used to measure the severity of distorted/missing alarm messages in a power system, but it is less effective for unsynchronized time series. For example, if the time stamp information of an alarm message is inaccurate, the result that is obtained may not accurately represent the severity.
The DTW distance has been successfully applied in digital voice processing, and it can be applied to two time series with different lengths. In other words, different lengths between two time series can be addressed via this method. The essence of this method is to calculate the distance between two time series by looking for the minimal path between them. DTW can be defined as a recursive form: where d(x, y) = ||x-y||; Rest(X) = 〈x 1 , x 2 ,…, x n 〉; Rest(Y) = 〈y 1 , y 2 ,…, y n 〉. Both missing and false alarms and the interference caused by unsynchronized time signals should be appropriately considered in power system fault diagnosis. The advantages of the DTW distance and the edit distance are combined, and a new distance concept is proposed, as discussed in Section 3.4, to measure the similarity between two alarm time series.
Time series similarity matching can be divided into the whole sequence matching query and the sub-sequence matching query. The whole sequence matching query aims to determine the time series similar to the query sequence overall from the default set of the query sequence. The potential of a matching sequence that is obtained should be approximately the same as that of the query sequence. Through subsequence matching queries, all subsequences that are lesser than a specified distance from the query sequence are found, and their position offsets and lengths are calculated. When a fault occurs in a power system, alarms are all sent to the dispatch centre, and the alarm sequence is then formed. An alarm series whose length is lesser than that of the alarm sequence hypothetic is formed by the fault hypothesis of the suspicious component(s). If a sub-time series with a distance less than a given distance from the alarm hypothesis time series is found, the fault hypothesis can then be considered true. Owing to the significant advantages, the sub-sequence matching query is employed in this study.

TIME SERIES IN POWER SYSTEMS
The time of an action event can be automatically recorded by a relay device or a smart electricity meter. Then, SOE information with a unified timing reference will be formed. Accurate time information is included/attached in these time series. ]. S = 1 or 0 respectively means that the element x = (v, t) is the required item or a fuzzy item.

Time sequence model in power systems
If the element is a fuzzy item, then the edit distance between X 2 and X 1 is set to be 0, although X 1 has one more element than X 2 . If the edit distance between two time series is 0, then these two time series are considered to be mutually equivalent. If the element is a required item rather than a fuzzy one, the edit distance between X 2 and X 1 is set to be 1.
After the occurrence of an event in a power system, some alarm messages may not be sent successfully or sent at all to the power dispatching centre. In this case, these alarm messages would not be included in the time series of received alarms. The fuzzy item is introduced in the time series model to model missing or unobservable alarms.

General configuration model for protection devices in power systems
When a fault occurs in a power system, an alarm hypothesis time series will be formed using the operation information of protective relays and circuit breakers. According to the setting specification of a protection device, a series of time series for alarm hypothesis may be built, as shown in Figure 1 ($ represents the end point of the time series). Figure 1 shows that each time series generated by a protective device configuration is terminated by "$" or the action of another set of protection devices. If another set of protection devices is triggered, the time series will be longer because the action of the protection device represents the starting element of another time series. For example, if a protection device fails to operate in response to a fault, its backup protection devices will be triggered. Therefore, the queried time series are composed of all these time series.
For any element in a time series group G, the event time t i is specified by the setting value of the protection device concerned, and the time length ∆t i is set by the allowable time error. When the protection device operates, the time series in the related time series group starts to occur successively, and an accurate time series can then be formed by the messages received by the power dispatch centre. A time series sample is illustrated by the example in Figure 2. It is assumed that an instantaneous single-phase grounding fault occurs on transmission line L1 at t = 0, and the circuit breakers are successfully reclosed after a setting period.
The protective relay operating logic for transmission line L 1 can be described by the related time series group G as where X S = {((the main protective relay for line L 1 at the C 2 side acts, 10, 1),0)}; G X = {{((C 2 opens, 10, 1), 50),((C 2 recloses, 50,1), 1050)),{((C 2 opens, 10, 1), 50),((the failure protection device for C 2 acts,10,1), 250)}}. Once the main protective relay for line L 1 at the C 2 side operates, the circuit breaker C 2 will be tripped. If the circuit breaker C 2 trips successfully, C 2 will reclose after 1000 ms. If C 2 fails to trip or the insulation of C 2 is broken down, the failure protection would operate to trip C 2 . Before the tripped alarm of the circuit breaker failure protection of C 2 is received by the dispatch centre, the state of C 2 is uncertain and two possible states of C 2 may occur: (i) the tripped alarm of C 2 is uploaded but C 2 actually fails to trip; (ii) the insulation of C 2 is broken down, and the tripped alarm of C2 is not uploaded. Therefore, ((C 2 opens, 10, 0), 50) can be described as a fuzzy item. Whether or not C 2 trips successfully, two mutually exclusive time series would be formed, and only one time series would really occur under each fault scenario.

Alarm hypothesis time series set
The alarm hypothesis time series refers to the one with the possibility of occurrence when an event happens, and it can be obtained from the configurations of protection devices. The actual alarm time series is obtained from the dispatch centre. The time series matching refers to the procedure of finding the alarm hypothesis time series that matches most with the actual alarm time series. In Figure 2, it is assumed that an instantaneous single-phase grounding fault occurs on line L 1 at time t = 0, and the protective relays on both sides of L 1 are operated. The action logic of the protection device configuration could be expressed by a time series as: = ((the main protective relay for L 1 in station A operates, 10, 1), 50); x ' 3 = (v 3 , t 3 ) = ((the main protective relay for L 1 in station B operates, 10, 1), 50); x ' 4 = (v 4 , t 4 ) = ((the circuit breaker C 1 trips, 10, 1), 100); x ' 5 = (v 5 , t 5 ) = ((the circuit breaker C 2 trips, 10, 1), 100); x ' 6 = (v 6 , t 6 ) = ((the circuit breaker C 1 recloses,1 0, 1), 1100); and x ' 7 = (v 7 , t 7 ) = ((the circuit breaker C 2 recloses, 10, 1), 1100).
The related alarm time series group G 1 includes the time sequence hypotheses The single-phase grounding fault that occurs on L 1 would trigger X 3 and X 4 simultaneously, and thus the relevant flag f = 1.
With respect to the alarm time sequences X 3 and X 4 in G 1 , if they match with the actual alarm sequenceX 2 = ⟨x 2 , x 3 , x 4 , x 5 , x 6 , x 7 ⟩, then it can be concluded that X 3 and X 4 have actually occurred. Thus, it can be determined that an instantaneous single-phase grounding fault occurred on line L 1 at time t = 0.

Improved time series distance
When the similarity of two time series is measured, both their differences and the impacts caused by false time stamps need to be considered. Based on the concept of the edit distance and DTW distance, a concept of distance that can be used to measure the similarity of two alarm time series is proposed. First, the following symbols are described: The operator | = represents a special relationship between two time series elements x = (v x = (A x , Δt x , S x ), t x ) and y = (v y = (A y , Δt y , S y ), t y ). If A x = A y , x| = y.
For two time series X = ⟨x 1 = (v x1 , t x1 ), x 2 = (v x2 , t x2 ), … , x n = (v xn , t xn )⟩ and Y = ⟨y 1 = (v y1 , t y1 ), y 2 = (v y2 , t y2 ), … , y m = (v ym , t ym )⟩ (n ≤ m), if there is no fuzzy item included in X and Y, the distance between X and Y can be defined as where the sub-time series Y s = ⟨y 1 ′ , y 2 ′ , … , y p ′ ⟩ in Y matches X. For any y i ′ | ∈ Y s , y i ′ | ∈ X holds, and y i ′ is the element of the time series Y (i = 1, 2, … p). D edit (X, Y s ) is the edit distance between the time series X and Y s , and it can be used to identify false alarm information. Considering that Y s is a sub-time series of Y, a simplified formula for calculating D edit (X, Y s ) can be obtained as D edit is used to measure the time distance between the time series X and Y s .
when The severity of the missing/distorted alarm information in the power system can be measured using the edit distance. The larger the edit distance, the lower will be the confidence level of the time series that is set for the alarm hypothesis. If the percentage of the distorted/missing alarms is high, a can be set to a small value to improve the fault tolerance capability of the fault diagnosis model. Conversely, a large value of a can be employed to mitigate the interference and avoid mistakes in the fault diagnosis process. Similarly, if the time series information is less accurate, the value of b can be set to be smaller to mitigate the impacts of false time stamps.
Both missing/distorted alarms and false time stamps can result in a difference between the alarm hypothesis time series and the actual alarm time series. The weighting factors a and b are used to quantify the difference between these two time series. The smaller the value of D(X, Y), the smaller will be the difference between these two time series.

Confidence levels of alarm hypothesis time series and included elements
The smaller is the distance between the alarm hypothesis time series X and actual alarm time series Y, the higher will be the confidence level of the hypothesis. The confidence level C X of the alarm hypothesis X can be defined as In Equation (12), the distance between the alarm hypothesis time series and the actual alarm time series is mapped into [0, 1] by employing the inverse proportion function as the confidence level of the hypothesis. If the distance is sufficiently small, the confidence level of the alarm hypothesis can approach 1.
When a fault occurs in the power system, the relevant protective relay(s) will operate to trip the corresponding circuit breakers. Under complex situations with multiple faults and/or malfunctioning protection devices, there may be two or more alarm hypothesis time series with high confidence levels. For the example shown in Figure 2, if a single-phase grounding fault occurs on line L 1 , both X 3 and X 4 may have high confidence levels.
In the set of alarm hypothesis time series, the same time series element x i = (v i = (A i , Δt i , S i ), t i ) may exist in multiple alarm hypothesis time series X 1 , X 2 , … , X n . If the time series X 1 , X 2 , … , X n are not conflicting, the confidence of the element x i should be the average confidence of time series X 1 , X 2 , … , X n . If the time series A, B, C is mutually exclusive, the maximum confidence level of time series X 1 , X 2 , … , X n should be assumed by x i . The confidence level of x i can then be as given by The similarity of time series can then be converted into the confidence level of the action of an element using Equations (12) and (13); in this manner, the faulty element(s) can be identified.

Corrections of the confidence levels for missing alarms
In an actual power system, false alarm messages cannot be completely avoided because the communication channel and related devices may not always function properly. Before carrying out fault diagnoses, it is necessary to pre-process the received alarms in order to achieve accurate fault diagnosis. For a fault scenario with missed alarm(s), a correction strategy is proposed to enhance the accuracy of fault diagnosis.
If the length of the alarm hypothesis time series X is n(n ≥ 3) and the edit distance between X and the actual alarm time series Y is d(0 < d ≤ n 3 ), then there must be an element x i = (v i = (A xi , Δt xi , S xi ), t xi ) in X without the counterpart in Y, that is, x i | ∉ Y . In this case, it can be assumed that the time series element y i = (v yi = (A xi , Δt xi , S xi ), t xi ) is missed. In this case, a virtual time series element y i can be inserted into Y to form a new time series Y'. Then, the component operation confidence level is recalculated using Equations (9)-(13) for the alarm hypothesis time series containing the event A xi . In the calculation process, the confidence level of the virtual event y i is updated using Equation (14).
where C xi is the component operation confidence level calculated by X and Y'; n is the cardinality of X, that is, the number of elements in X; and d is the edit distance between X and Y. The value of d is positively correlated with the number of missed alarms. From Equation (14), it can be determined that the greater the value of n and the smaller the number of false alarms, the higher will be the confidence level of the hypothesis element x i and the virtual event y i .

FAULT DIAGNOSIS PROCEDURE
When a fault occurs in a power system, the relevant protective relay(s) will operate to trip the corresponding circuit breaker(s), and the faulty component(s) will then be isolated. Real-time status information of these circuit breakers can be used to identify the outage area(s). Faulty components must be included in the outage area(s), and hence fault diagnosis can be limited to the outage area(s) [9]. Because the number of component(s) and protection device(s) in the outage area(s) are limited, according to the general configuration model of the protective relay(s) shown in Figure 1, it is not difficult to obtain sufficient hypothesis series for each protective device. In this manner, online fault diagnosis can be implemented. The identification of the outage area(s) should ideally be carried out after the fault procedure is completed, that is, no more protective relays and circuit breakers will be operated. At this time, complete alarm messages are received by the dispatch centre. The actual alarm time series can be formed in chronological order according to the time of arrival of alarms. The fault diagnosis model based on time series matching can then be solved quickly and effectively, and the requirements for online fault diagnosis can be met. The fault diagnosis procedure is shown in Figure 3. Specific steps are as follows: Step 1: A set of suspected faulty components can be obtained by searching the outage area.

Input the network topology information and protection configuration information
Step 1 Step 2 Step 3 Step 4 Step 5

FIGURE 3
Fault diagnosis in power systems based on time series matching will either end with the symbol '$' or it will end with the action event of the protective device as the next priority. In the latter case, the complete time series can be obtained by adding time series corresponding to the action events of the subsequently triggered protection devices. Thus, the hypothetical time series of alarms can be formed.
Step 3: The distance between each alarm hypothesis time series and the actual alarm time series received by the dispatch centre is calculated using the sub-time series matching method.
Step 4: The obtained time series distances are converted into the confidence levels of the action event for each component. Then, the fault confidence level for each suspected element can be obtained, and the faulty component(s) will then be finally determined.

CASE STUDIES
In this section, the proposed method is demonstrated by the fault scenarios that occur in an actual power system. The structure of the actual power system is shown in Figure 4. To facilitate the presentation, circuit breakers (CBs) are renamed, and they are also shown in Figure 4. For the convenience of description, related alarm messages are encoded as shown in Table 1.
The reported alarms and corresponding actual alarm time series are shown in Table 2. The time of occurrence of the first alarm is used as the reference time point. The actual fault scenario: faults occur on L4335 and L4336, and C10 fails to be tripped in TL substation; this is demonstrated by the proposed method.
The values of the weighting factors a and b in Equation (9) are both set to 5 to demonstrate equal importance between missing/distorted alarms and incorrect time stamps.
(i) The outage area shown in the shaded portion of Figure 4 can be identified from the power system after the fault procedure is completed. There are three components (i.e. L4335, L4336, and B1-I) and eight circuit breakers (i.e. C3, C6, C10, C11, C12, C13, C14, and C18) in the outage area. Thus, the suspected faulty components are L4335, L4336, and B1-I. (ii) The actual alarm time series X =⟨x 1 , x 2 , … , x 16 ⟩ is constructed with the actual alarms received by the dispatch centre in Table 2. At the same time, the protection configuration models involving L4335, L4336, and B1-I are   established as shown in Table 3 according to the configuration specifications of protection devices. Based on the connections between suspected faulty components and protection devices, the alarm hypothesis time series sets are then generated. Connections between suspected faulty components and protection devices are as follows: ((a 26 , 10, 1), 0),   Table 4. Ref. [4] According to the setting specifications of each protection device (protective relay or circuit breaker), the expected state value of each protection device is generated by logic operation. The logic is clear, although the process is complicated.
The correlation between the deviation of time stamps and the accuracy of alarms is partly employed.
The time complexity is o (2 n ). An optimization model needs to be solved, and the computational speed is relatively slow.
Ref. [12] It is necessary to construct a Petri net fault diagnosis model for suspected fault elements in advance, and the model generation process is complex.
Timing information is employed to filter alarm information in order to eliminate erroneous data, but timing features not fully employed.
The time complexity is o(k 3 n). Matrix operation is involved, with fast computational speed.

The proposed method
According to the setting specifications of each protection device, a general model of protection configuration is generated for the protection device involving suspected faulty components. The model generation process has a clear logic, and is easy to implement.
The correlation between the time stamp deviation and the accuracy of alarms is fully employed.
The time complexity is o(ln). The computational speed is fast Note: o(*) represents the time complexity (calculation scale); k is the matrix dimension; l is the length of the alarm information time series; n is the number of suspected faulty elements.
(iv) The distances listed in Table 4 are converted into the confidence levels of faulty components using Equations (12)- (14). The obtained fault confidence levels of L4335 and L4336 are respectively 0.92 and 1, while the confidence level of B1-I failure is 0.1.

(v) Evaluation
The fault diagnosis results can now be obtained: two faults occur on L4335 and L4336, and the actual occurring time points of these two line faults are −20 (±10) ms and 840 (±10) ms, respectively. In addition, C10 fails to be tripped in TL substation. The fault diagnosis results are in agreement with the real-life fault scenario, and faulty components are correctly identified.

(vi) Comparison with previous methods
In the previously proposed methods utilizing temporal information, such as the method based on analytic models [2], timeconstrained networks [4], Petri nets [11,12] and so forth, the quantitative relationship between the time stamp deviation and alarm credibility cannot be accurately represented. Among these studies, the pre-processed temporal information can only be used as a binary input parameter of the model based on whether the time series conflicts with the action logic of the device, instead of calculating the confidence level with a clear meaning by mining the time information of the alarm. Meanwhile, the specific confidence level in Table 4 can help the dispatch centre staff to more clearly understand the cause of the fault, and can contribute to track the real cause of the alarms with incorrect time information in order to improve the communication reliability of the power system.
Two power system fault diagnosis methods using timing information presented in existing publications were also applied for the test cases. Comparisons between the proposed method and these two methods were carried out, and the results are exhibited in Table 5.
From Table 5, it can be seen that the proposed method maximizes the use of the timing and correlation characteristics of alarm events by introducing the concept of time series data mining and related techniques, and it also enhances the accuracy and error tolerance capability of fault diagnosis. The proposed method can still rapidly identify missing/distorted alarms for complex and concurrent faults. Compared with the other two methods, the proposed method can generate the fault diagnosis model and implement fault diagnosis more efficiently, and hence better meet the requirement of online fault diagnosis.

CONCLUSIONS
In this study, the concept of time-series similarity matching and its related method are employed to address the fault diagnosis problem in a power system. Specifically, a power system fault diagnosis method based on time series similarity matching is developed, with temporal information and correlation characteristics among alarms being fully employed. Faulty components and fault types can be quickly identified, even under complicated fault scenarios with distorted/missing alarms. The operating accuracy of each protection device can also be evaluated. Finally, the fault diagnosis result of a real-life fault scenario demonstrated the accuracy and efficiency of the proposed method. Compared with the existing methods for power system fault diagnosis with the temporal information of alarm messages employed, the proposed method in this work has some advantages. Not only is alarm time series similarity matching utilized, but the use of temporal information and associated characteristics of alarms are maximized via the proposed method. The fault diagnosis accuracy and fault tolerance capability are improved by utilizing the time series distance to measure the degree of negative impacts of false alarms and false time stamps. Moreover, via the proposed method, the faulty protective relays and/or circuit breakers as well as distorted/missing alarms can be identified, even for situations involving complicated and/or successive faults.

ACKNOWLEDGEMENT
This work is supported by National Key Research and Development Program of China (2017YFB0902900).