Volume 17, Issue 9 p. 1972-1984
ORIGINAL RESEARCH
Open Access

Accounting for component condition and preventive retirement in power system reliability analyses

Håkon Toftaker

Corresponding Author

Håkon Toftaker

SINTEF Energy Research, Trondheim, Norway

Correspondence

Håkon Toftaker, SINTEF Energy Research, Trondheim, Norway.

Email: [email protected]

Contribution: Conceptualization, Formal analysis, ​Investigation, Methodology, Software, Visualization, Writing - original draft, Writing - review & editing

Search for more papers by this author
Jørn Foros

Jørn Foros

SINTEF Energy Research, Trondheim, Norway

Contribution: Conceptualization, Formal analysis, ​Investigation, Methodology, Resources, Validation, Writing - original draft, Writing - review & editing

Search for more papers by this author
Iver Bakken Sperstad

Iver Bakken Sperstad

SINTEF Energy Research, Trondheim, Norway

Contribution: Conceptualization, Data curation, Methodology, Project administration, Validation, Writing - original draft, Writing - review & editing

Search for more papers by this author
First published: 10 March 2023
Citations: 1

Abstract

Deteriorated power system components have a higher probability of failure than new components. Still, the reliability of supply analyses traditionally models all components of the same type with the same probability of failure, and thus neglects the effect of deteriorated components. This paper presents a methodology to integrate a condition-dependent component probability of failure model into a power system reliability analysis. The component state is described by a semi-Markov process, and the paper shows how this, under reasonable assumptions, can be approximated by a Markov process. The Markov assumption simplifies the analysis and allows the model to include preventive retirement and be calibrated to statistical data. A case study using statistical data for Norwegian power transformers shows that, in the Norwegian power system, the proportion of failures that are due to the poor condition is small, partly due to the common strategy of preventive retirement. However, if the condition of the transformers were worse, the impact of poor conditions can be considerable. The methodology further enables the identification of the transformers that contribute most to the risk to the reliability of supply. The paper thus highlights the importance of accounting for the component condition in strategic decisions such as long-term renewal planning

1 INTRODUCTION

Ageing power systems with deteriorating components are a major concern for the continued reliable supply of electric power. Addressing this concern calls for an integrated approach to power system reliability analysis in which reliability analyses both at the component level and system level are included. Traditionally, the first level focuses on a single asset or component in the power system (e.g. a transformer station) but does not properly account for its importance in the power system for the reliability of supply. The second level takes a broader view of the power system but usually neglects how the condition of individual components influences their probability of failure and how this contributes to the overall power system risk. For instance, the reliability of supply analyses applied for long-term transmission system planning studies commonly assumes the same failure rate for all components of the same type. However, it is well known that deteriorated power system components have a higher probability of failure than new components.

The objective of this paper is to develop and illustrate a methodology for integrating information on the component condition in the reliability of supply analysis. This has multiple benefits. The methodology can aid asset management decisions by assessing the benefit of, for example, replacing an old transformer with a new one, and generally aid prioritization of measures between different components and locations in the power system. Moreover, better information on the components' probabilities of failure can improve the accuracy of the reliability of supply analyses and better inform long-term system development decisions.

1.1 Related work

In power system reliability analysis, component failures have traditionally been modelled by a constant failure rate. However, some work has been done on accounting for the component condition through condition-dependent failure rates. A methodology for addressing and including ageing failures was originally described by Li in [1], and this approach has been further detailed in [2], and adopted by several subsequent papers. A broader review of available methods is found in [3]. There, age-related failures are categorized as either repairable or end-of-life failures. In [4], the influence of modelling transformer age-related failures in system reliability analyses is studied using an Arrhenius-Weibull model. That work was extended in [5] by quantification of uncertainties related to the failure model. In [6], the condition-dependent probability of failure of transformers is deduced based on the notion of an equivalent age similar to [7], and this is included in system reliability modelling following the approach of [1]. Other examples are found in [8] and [9], where the impact of time-varying failure rates on distribution system reliability is studied.

In this manuscript, we will not use the term ageing failure for condition-dependent failures, but instead the term wear-out failure. This highlights that these failures are caused by the degradation of technical conditions, and not directly by age itself. Ref. [3] follows [2] and assumes that a wear-out failure is unrepairable and subsequently that a failed component will stay out indefinitely. From the power system perspective and for the purpose of reliability of supply analysis, it is however important to consider the function performed by the component in the power system rather than the actual physical component itself. In other words, even if the physical component with a wear-out failure cannot be repaired, its function in the power system can be restored by replacing it with another physical component.

Two key challenges when accounting for the component condition in power system reliability analysis are how to assess the condition and how to deduce the wear-out failure rate from the condition. For large-scale system analyses an estimate of the aggregated overall condition of the components, which includes all failure causes and failure modes, is useful to keep the level of detail feasible. Such an estimate is often called a health index in the literature. Many methods have been suggested for aggregating condition grades into an overall health index, and this is still a topic of current research. Some early examples of applying expert judgement and weighting schemes are those of Anders et al. [10] and Jahromi et al. [11]. More recent examples are methods that utilize e.g. fuzzy theory [12, 13], Markov chains [14], Bayesian belief networks [15], machine learning methods [16, 17], and adjustment of failure rates [18].

In general, one may try to deduce wear-out failure rate from conditions based on a physical understanding of the degradation processes, or with statistical or data-driven methods. The first approach is used in parts in [4] and [7]. An example of the second approach is given in [19], where failure rates of underground power distribution cables are studied as a function of ageing and loading using regression models fitted directly to statistical data. A challenge with purely statistical or data-driven approaches is the large amount of data needed, which often is not available. Models based on the physics of degradation can circumvent this problem, but the prediction from such models may not straightforwardly agree with the observed failure rates in the power system. In [20], an effort is made to address this shortcoming by calibrating the failure rate to statistics, but the literature on this subject is otherwise limited.

Another factor that is important to take into account in power system reliability analysis, but which is rarely discussed in this context, is preventive retirement. This is a common asset management strategy whereby ageing components with the poor condition are replaced (i.e. reinvestment) before running to wear-out failure. From a reliability perspective, it has the effect of considerably reducing the observed wear-out failure rate compared to the potential, underlying failure rate without a preventive retirement strategy.

Here, we propose a methodology for power system reliability analysis that accounts for both component conditions, preventive retirement, replacement of failed or retired components with new components, and calibration of observed statistics. The methodology thus addresses important shortcomings in the literature.

1.2 Outline and contributions

The methodology proposed in this paper is based on a failure model where the wear-out failure rate is established from available condition information given in terms of a health index. This model is integrated into an existing, analytical methodology for power system reliability analysis. The presented approach calculates the annual reliability of supply indices and is intended for estimating the reliability of supply expected next year or over the next few years. To illustrate the methodology, a case study is presented using a simple power system model in which condition information is included for all transformers.

Two of the main contributions of this paper are: (1) The general integration concept for specifying condition-dependent failure rates for individual components in a power system reliability analysis, and (2) its application to integrate a transformer health model [7]. Thus, the present work builds upon [7] to bridge the gap between the individual components and the power system in analyses of the reliability of supply. Unlike related work such as [4] that integrates an Arrhenius-Weibull model, or [21] where degradation is modelled by assuming a bathtub curve, the present work integrates a more comprehensive, bottom-up component reliability model able to utilize available condition information. Specifically, this work extends [7] by the following additional contributions: (3) A competing risk model considering both wear-out and mid-life failures, (4) the inclusion of preventive retirement through a semi-Markov process, and (5) the calibration of component reliability to real, statistical data representative of the system. This allows us to (6) give empirical insight into the influence of the component condition on the reliability of supply.

The paper is organized as follows: Section 2 presents the general framework for the integration of component and power system reliability analysis. Section 3 describes how to account for the component condition and preventive retirement. In Section 4, the integration approach is demonstrated using an example transformer condition model. The section also illustrates which data sources and data processing methods that are necessary to integrate such a model in a power system reliability analysis. It is then demonstrated how real condition and reliability data for Norwegian power transformers can be used to quantitatively analyze the impact of the component condition on the reliability of supply. Section 5 concludes the paper by summarizing potential improvements, extensions, and applications of the methodology.

2 METHODOLOGY FOR INTEGRATING COMPONENT AND POWER SYSTEM RELIABILITY ANALYSIS

This section describes the proposed methodology to account for individual component reliability in power system reliability analyses. The overall integration approach is described in Section 2.1. Then, the concrete reliability of the supply analysis methodology is briefly presented in Section 2.2 to illustrate the requirements of the integration methodology as seen from the power system perspective.

2.1 Overall integration approach

A high-level framework for the overall reliability of supply analysis considered here is summarized in Figure 1. The blue rectangles represent different modules in the framework and the green boxes represent input and output data that define the interfaces of these modules. The uppermost row of green parallelograms represents the factors (information) that the analysis ideally should take into account. The highlighted parts of Figure 1 are the main concerns of this paper, namely the integration of the technical condition of components in the reliability of supply analysis.

Details are in the caption following the image
High-level framework for power system reliability of supply analysis, with the main concern of this paper highlighted in purple

Figure 1 distinguishes between a component reliability model and a power system reliability model. The component reliability model in general includes a failure model, a preventive retirement model, calibration to observed failure statistics, and an outage time model. Here, we will give attention to the modelling of component failure rates. We will not consider component outage times in similar detail.

Depending on the power system reliability analysis methodology, the inputs to the system reliability model could be failure rates, probabilities of failure (e.g. for a given year or a given hour), or average component availabilities. Here, we use as a starting point an analytical reliability analysis methodology for which the inputs must be in the form of annual failure rates. A resolution of one year is sufficient for the failure rates since we focus on the condition dependence of the failure rates, and the condition typically does not vary appreciably within one year for electric power components such as transformers.

2.2 Power system reliability analysis

Methods for power system reliability analysis are commonly divided into two groups: (i) Analytical methods and (ii) Monte Carlo simulation methods [2, 22, 23]. The high-level framework in Figure 1 is general and could represent (in a generic and simplified manner) any method for power system reliability analysis. In the rest of this paper, we will focus on accounting for the component condition in analytical methods. One reason is that it allows us to provide additional analytical insights through the presentation of the proposed methodology. Another reason is that it allows us to integrate the methodology into an existing, comprehensive analytical reliability of supply methodology (OPAL) [22, 24]. Previous works have demonstrated the integration into OPAL of other factors indicated in Figure 1, such as multiple operating states from power market models [25, 26], time-dependent reliability data from the Norwegian fault and interruption statistics database [26] (FASIT [27]), and time-dependent interruption costs as calculated in the Norwegian cost of energy not supplied (CENS) scheme [26, 28]. As with most reliability of supply analyses, OPAL does however not account for the technical condition of the individual power system components. We should nevertheless stress that the methodology proposed in this paper is general. Section 5 will discuss how the methodology can be generalized to account for the component condition in Monte Carlo-based methods for the reliability of supply analysis [23, 29], and we also refer to [30] for some already published initial work towards this goal.

The OPAL methodology is based on the analytical minimal cut set methodology [22, 24]. Contributions to the reliability of supply indices are calculated for each operating state, each delivery point, and each contingency j that correspond to a minimal cut set for delivery point k and operating state i. For the expected annual energy not supplied (ENS), these contributions can be calculated as:
EN S a i , j , k = λ i , j · r i , j · P interr , i , j , k . $$\begin{equation}{\rm{EN}}{{\rm{S}}_{\rm{a}}}_{i,j,k} = \lambda {{\rm{^{\prime}}}_{i,j}} \cdot {r_{i,j}} \cdot {P_{{\rm{interr}},i,j,k}}.\end{equation}$$ (1)
where λ i , j $\lambda {^{\prime}_{i,j}}$ and r i , j ${r_{i,j}}$ denote equivalent failure rates and outage times for contingency j, and P interr , i , j , k ${P_{{\rm{interr}},i,j,k}}$ denotes the power interrupted at delivery point k. The interrupted power is estimated in a contingency analysis that simulates the consequences in the power system of the changes in grid topology for each combination of contingency and operating state. Here, an optimal power flow model is used to capture grid constraints and necessary generator rescheduling and load-shedding actions during the contingencies [25].

In the analytical method considered here, equivalent failure rates and outage times for cut sets are calculated using approximate frequency and duration techniques [22, 24]. Moreover, we have here introduced the notation λ i , j = λ i , j Δ t i / ( i Δ t i ) , $\lambda {^{\prime}_{i,j}} = {\lambda _{i,j}}{{\Delta}}{t_i}/( {\mathop \sum \nolimits_i {{\Delta}}{t_i}} ),$ where Δ t i ${{\Delta}}{t_i}$ is the duration of operating state i, to account for the fraction of the year represented by each operating state [24].

Other reliability of supply indices is calculated as described in detail in [24]. The indices can be aggregated by summing over all contingencies, all operating states, all delivery points, or a combination of these sets. Here, the expected annual cost of energy not supplied (CENS) will be calculated by assuming a specific interruption cost c k ${c_k}$ that only depends on the customer type at delivery point k:
CEN S a k = c k · EN S a k . $$\begin{equation}{\rm{CEN}}{{\rm{S}}_{\rm{a}}}_k = {c_k} \cdot {\rm{EN}}{{\rm{S}}_{\rm{a}}}_k.\end{equation}$$ (2)

In the OPAL methodology, different failure rates λ can be specified for each individual component, but the currently available input data are average annual failure rates for each type of component, as calculated from the FASIT system [31]. The following section presents a methodology to provide condition-dependent failure rates.

3 COMPONENT RELIABILITY MODEL

To understand and address condition-dependent failures, it is useful to distinguish such failures from other types of failures. This chapter describes a bottom-up approach to building a condition-dependent component reliability model. Power transformers are used as an example, but the approach applies generally. An advantage of a bottom-up approach as opposed to a top-down (e.g. regression) approach is that it better conveys the underlying processes or phenomena leading to failure.

3.1 Overall modelling approach

The approach divides the transformer into two; the active part (windings, core and oil) and all other parts (bushings, tap changer, tank, cooling system, etc.). It is common for transformer owners to have quantitative information on the condition of the active part from oil sampling and other measurements. For the non-active part, information on the condition is often not available in an easy-to-use quantitative format, or not available at all. Due to this, a condition-dependent failure model is only pursued for the active part of this paper. To establish the model, the active part is assumed to be non-repairable, since a failure in the active part often is difficult or expensive to repair, and therefore in practice often results in the whole transformer being scrapped. Failures in the active part and in the non-active part are assumed mutually exclusive, that is, simultaneous failures are disregarded. The failure rates of the active and non-active parts can then be established separately.

In general, also the outage time depends on which part of the transformer failed, and on whether the failure propagated to cause several of the transformer parts to fail. Here, the outage time is however assumed the same for all transformer failures. This is a simplification, but outage time modelling is a demanding exercise that requires a separate analysis outside the scope of the present paper.

Active part failures are divided into internally and externally caused, as shown in Figure 2. This classification is based on a conceptual framework [32] that lays the foundation for the standardized fault and interruption data collection and reporting system that is implemented in the Norwegian power system (FASIT) [27]. The internally caused failures are divided into failures caused by defects from design, manufacturing or installation and failures caused by the degradation of technical conditions. The first of these corresponds to the early-life failures in the well-known bathtub failure curve, while the latter corresponds to the wear-out failures. The externally caused failures are divided into failures caused by natural hazards, operational stress and human threats. Together, these correspond to the mid-life failures in the bathtub curve, that is, failures giving a close to constant failure rate when averaged over a long time.

Details are in the caption following the image
Simplified classification of failures in the transformer active part into main causes. The same classification can be applied to the non-active part, but this is not pursued in this article

Based on the classification in Figure 2, a condition-dependent failure model for the active part is further developed in the next section. To do so, some simplifications are made: Early-life failures are neglected, and mid-life and wear-out failures are assumed mutually exclusive, although in reality there may be single failures that have multiple contributing causes.

In lack of condition data, failures in the non-active part are treated simply as failures with constant failure rate, independent of the transformer condition and age. The failure rate for the non-active part can then be obtained directly from failure statistics, that is, the failure rate is set equal to the observed historical average failure rate of representative transformers.

3.2 Failure model for the active part

The condition-dependent component failure model developed here is based on [7] and illustrated in Figure 3. It is developed for the transformer active part, but the model applies generally. Because the condition varies with time, the model output should be specified to be valid for a given time interval. Here we focus on the present, that is, a time horizon of approximately one year, and assume that component condition does not vary significantly within one year. This is a reasonable assumption in most cases, as condition typically does not vary appreciably within one year for long-living electric power components such as transformers. It does however mean that our approach is not suitable for components with rapid degradation.

Details are in the caption following the image
High-level framework for condition-dependent component failure model to be included in the power system reliability of supply analysis

Module (a) in Figure 3 aggregates component condition information. It estimates the overall condition of the component in terms of a health index from measured condition data, [7]. The aggregation model should ensure that all wear-out failure causes are accounted for. The model for the transformer active part that will be used as an example is given in Section 4.2.

Module (b) introduces the notion of an apparent age as in Ref. [7]. The apparent age can be defined as the age implied by the component's health index when compared to the average health index of a reference data set (such as the total component population). An example of a formula established for deducing the apparent age of the transformer active part is given in Section 4.2.

Module (c) utilizes the concept of competing risks [33] to estimate the total component failure rate including both wear-out failures and all other failure types. Competing risks (or competing failure types) means that the first of the failure types to reach failure causes the component to fail. If the failure types are assumed independent, the total failure rate λ of the component is given by:
λ t = l = 1 N λ l t $$\begin{equation}\lambda \left( t \right) = \mathop \sum \limits_{l = 1}^N {\lambda _l}\left( t \right)\end{equation}$$ (3)
where λ l ${\lambda _l}$ is the failure rate for failure type l and N is the number of competing failure types. This paper includes mid-life and wear-out failures so that N = 2 $N = 2$ . When failure types are independent, λ l ${\lambda _l}$ is given by:
λ l ( t ) = d F l ( t ) d t 1 F l ( t ) $$\begin{equation} {\lambda _l} (t) = \frac{dF_{l}(t) dt}{1 - {F_l}(t)}\end{equation}$$ (4)
where F l ( t ) ${F_l}( t )$ is the marginal cumulative distribution for the latent failure times of failure type l. The latent failure time is the hypothetical failure time for failure type l if the other failure types are not present. However, since in reality, a component is subject to several failure types, the latent failure time suffers from censoring by the other failure types. It also suffers from censoring by preventive retirement, which is a common asset management strategy.
Since mid-life failures represent failures that are independent of condition and give a close to constant failure rate when averaged over a long time, the mid-life failure rate λ m l ${\lambda _{ml}}$ is taken directly from failure statistics without using Equation (4). Wear-out failures on the other hand are determined by condition. To enable the condition to be taken into account, we follow [7] and assume that Equation (4) provides a better estimate of the wear-out failure rate for individual components if it is used as a function of apparent age instead of calendar age. The wear-out failure rate λ w ${\lambda _w}$ is then estimated by:
λ w τ H I = f w τ H I 1 F w τ H I d τ d t $$\begin{equation}{\lambda _w}\left( {\tau \left( {HI} \right)} \right) = \frac{{{f_w}\left( {\tau \left( {HI} \right)} \right)}}{{1 - {F_w}\left( {\tau \left( {HI} \right)} \right)}}\frac{{d\tau }}{{dt}}\end{equation}$$ (5)
where H I $HI$ is the health index, τ is the apparent age, f w = d F w ( t ) / d t ${f_w} = d{F_w}( t )/dt$ , and F w ( t ) ${F_w}( t )$ is the marginal cumulative distribution function of the time to wear out failure. The derivative in Equation (5) is needed for the failure rate to be given per calendar time (and not per apparent time). Due to censoring, a statistical data set without bias to establish F w ${F_w}$ is hard to acquire, since long latent failure times are censored by other failure types and preventive retirement. An approach to reduce the censoring problem is to base F w ${F_w}$ on scrapping statistics in addition to wear-out failure statistics. An example taken from [7] is given in Section 4.2, where an estimate of F w ${F_w}$ is obtained by extrapolating the observed condition at scrapping to estimated potential wear-out failure times.

Finally, module (d) calibrates the failure rate results to observed failure statistics and considers preventive retirement. This applies only to the wear-out failure rate since the mid-life failure rate is taken directly from statistics. Calibration is necessary for two reasons: (1) Censoring by other failure types and preventive retirement makes it difficult to acquire a statistical data set without bias, to correctly establish F w ${F_w}$ . (2) Due to limited statistical data at present, it cannot be expected that bottom-up component failure modelling alone will accurately predict actual failure rates in absolute terms, although such modelling may be very useful for predicting relative failure rates for a set of components. The calibration method is described in Section 4, after preventive retirement first is accounted for in the next section.

3.3 Accounting for preventive retirement

The above failure model estimates the instantaneous wear-out failure rate. Failure statistics generally provide annual failure frequencies, that is, the expected number of failures per year. Hence, to enable the failure model to be properly calibrated to statistics, the annual failure frequency must be estimated from the instantaneous failure rate. Since preventive retirement is so common, it can have a reducing effect on the annual failure frequency. Before calibrating, we, therefore, account for preventive retirement.

To account for the effect that preventive retirement can have on the wear-out failure frequency, we introduce the state diagram in the left side of Figure 4. The diagram illustrates a component in a state of some degree of wear-out, called "Up, old", from which the component can either transition to a failed state ("Down") or be replaced preventively, thus reaching the state "Up, new". In reality, there is a continuum of states between old and new, but as we are concerned with a limited time horizon we approximate this continuum by the two states “Up, new” and “Up, old”. The transition from “Up, old” to “Down” is given by the wear-out failure rate λ w ( τ ( H I ) ) ${\lambda _w}( {\tau ( {HI} )} )$ discussed in the previous section, while the transition from “Up, new” to “Down” is given by λ w ( 0 ) 0 ${\lambda _w}( 0 ) \approx 0$ , and is assumed to be negligible.

Details are in the caption following the image
A state diagram describing the reliability due to wear-out failure (left). The state diagram for the simplified model describing wear-out failures (right)

In addition, the component may transition to the “Up, new” state by preventive replacement. The rate of preventive replacement is denoted λ P M ${\lambda _{PM}}$ . If the component has failed, it is replaced and thus transitions to the “Up, new” state. This transition is given by the repair rate μ w ${\mu _w}$ . It is now reasonable to approximate the time to transition by exponential distributions, which means the diagram in Figure 4 may be treated as a Markov diagram.

Assume that we analyze a set of n components and that H I i $H{I_i}$ is the health index of component i. Furthermore, assuming that the outage time is short compared to the analysis horizon, the number of wear-out failures N i , w ( t 0 , t e n d ) ${N_{i,w}}( {{t_0},{t_{end}}} )$ of component i within the analysis horizon [ t 0 , t e n d ] $[ {{t_0},{t_{end}}} ]$ is independent of the outage time. The accuracy of these approximations has been verified for this case in a separate study [30]. This implies that by Wald's equation [34], the unavailability of the component due to wear-out failure is equal to:
U i , w = E N i , w t 0 , t e n d t e n d t 0 1 μ w = ω i , w μ w , $$\begin{equation}{U_{i,w}} = \frac{{E\left( {{N_{i,w}}\left( {{t_0},{t_{end}}} \right)} \right)}}{{{t_{end}} - {t_0}}}\frac{1}{{{\mu _w}}} = \frac{{{\omega _{i,w}}}}{{{\mu _w}}},\end{equation}$$ (6)
where E ( x ) $E( x )$ is the expected value of the random variable x, μ w ${\mu _w}$ is the repair rate, and ω i , w ${\omega _{i,w}}$ is the failure frequency.
The expected number of failures is:
E N i , w t 0 , t e n d = n = 1 n P N t 0 , t e n d = n . $$\begin{equation*}E\left( {{N_{i,w}}\left( {{t_0},{t_{end}}} \right)} \right) = \mathop \sum \limits_{n = 1}^\infty nP\left( {N\left( {{t_0},{t_{end}}} \right) = n} \right).\end{equation*}$$
As the wear-out failure rate of a new component is 0, and the condition is unchanged within the time frame, the probability of more than one failure is zero, that is, P ( N ( t 0 , t e n d ) = n ) $P( {N( {{t_0},{t_{end}}} ) = n} )$  = 0 for n > 1 $n > 1$ . It follows that E ( N i , w ( t 0 , t e n d ) ) $E( {{N_{i,w}}( {{t_0},{t_{end}}} )} )$ is equal to the probability of one failure. From the Markov diagram, it may thus be deduced that the failure frequency is:
ω i , w = P N i , w t 0 , t e n d = 1 Δ t = λ w τ H I i λ w τ H I i + λ P M Δ t 1 e λ w τ H I i + λ P M Δ t , $$\begin{eqnarray} {\omega _{i,w}} &=& \frac{{P\left( {{N_{i,w}}\left( {{t_0},{t_{end}}} \right) = 1} \right)}}{{{{\Delta t}}}}\nonumber\\ &=& \frac{{{\lambda _w}\left( {\tau \left( {H{I_i}} \right)} \right)}} {{\left( {{\lambda _w}\left( {\tau \left( {H{I_i}} \right)} \right) + {\lambda _{PM}}} \right){{\Delta}}t}} \left( {1 - {e^{ - \left( {{\lambda _w}\left( {\tau \left( {H{I_i}} \right)} \right) + {\lambda _{PM}}} \right){{\Delta}}t}}} \right),\nonumber\\ \end{eqnarray}$$ (7)
where Δ t = t e n d t 0 ${{\Delta}}t = {t_{end}} - {t_0}$ . To combine wear-out failures and other failures in a way that can be integrated into existing power system reliability analysis methods, the state diagram shown on the left side of Figure 4 is simplified to the diagram to the right in Figure 4. This is obtained by finding the equivalent failure and repair rates causing the unavailability of the component (i.e. the proportion of time spent in the state “Down”) to be preserved, that is:
U i , w = λ i , w λ i , w + μ w = ω i , w μ w , $$\begin{equation}{U_{i,w}} = \frac{{{\lambda _{i,w}}}}{{{\lambda _{i,w}} + {\mu _w}}} = \frac{{{\omega _{i,w}}}}{{{\mu _w}}},\end{equation}$$ (8)
where the right-hand side is taken from Equation (6). Solving for λ i , w ${\lambda _{i,w}}$ yields the corresponding equivalent wear-out failure rate:
λ i , w = μ w ω i , w μ w ω i , w $$\begin{equation}{\lambda _{i,w}} = \frac{{{\mu _w}{\omega _{i,w}}}}{{{\mu _w} - {\omega _{i,w}}}}\end{equation}$$ (9)

3.4 Calibration to observed failure statistics

We can now use these results to calibrate the failure model. The calibration should be done to a set of n components of a particular type (e.g. transformers) with a known average failure frequency, such as the total population of the component in a country, say Norway. We require the average wear-out failure frequency of the n components to be equal to the average observed wear-out failure frequency in Norway, that is:
ω ¯ w = i = 1 n ω i , w n = γ w ω s $$\begin{equation}{\bar \omega _w} = \frac{{\mathop \sum \nolimits_{i = 1}^n {\omega _{i,w}}}}{n} = {\gamma _w}{\omega _s}\end{equation}$$ (10)
where γ w ${\gamma _w}$ is the fraction of the total statistical failure rate due to wear-out failures, and ω s ${\omega _s}$ is the observed total failure frequency for the component type. The requirement (10) can be translated into an adjustment factor β for the failure frequency for each component given by β = γ w ω s / ω ¯ w $\beta = {\gamma _w}{\omega _s}/{\bar \omega _w}$ , that is, the calibrated failure frequency of component i is:
ω i , w = β ω i , w $$\begin{equation}{\tilde \omega _{i,w}} = \beta {\omega _{i,w}}\end{equation}$$ (11)

A corresponding calibrated failure rate λ i , w ${\tilde \lambda _{i,w}}$ may be obtained from (8).

3.5 Mid-life failures

All other failures to the component are treated as mid-life failures with a failure rate λ m l ${\lambda _{ml}}$ . This rate is obtained by matching the model to the historical frequency of mid-life failures ω m l = ( 1 γ w ) ω s ${\omega _{ml}} = (1 - {\gamma _w}){\omega _s}$ , and once again using Equation (6) to preserve unavailability, that is, λ m l = μ m l ω m l μ m l ω m l , ${\lambda _{ml}} = \frac{{{\mu _{ml}}{\omega _{ml}}}}{{{\mu _{ml}} - {\omega _{ml}}}},$ where μ m l ${\mu _{ml}}$ is the repair rate of mid-life failures. The overall failure rate for component i is subsequently obtained from Equation (3) as λ i = λ i , w + λ m l ${\lambda _i} = {\lambda _{i,w}} + {\lambda _{ml}}$ .

4 CASE STUDY

To demonstrate the integration approach in the preceding section and investigate the effect of accounting for the component condition in a (power system-level) reliability of supply analysis, we consider a case study where we first compare the following two scenarios: In scenario 0 all transformers are assigned the average failure rate of transformers based on national failure statistics in Norway, while in scenario 1 each transformer is assigned an individual condition and an individual failure rate. Hence, scenario 0 is the traditional way of modelling component reliability in power system reliability analyses, while scenario 1 is the new way proposed here. It is assumed that repair rates for wear-out failures and mid-life failures are equal, that is, μ w = μ m l ${\mu _w} = {\mu _{ml}}$ . Different repair rates could be accounted for as suggested by [2], but in the lack of firm knowledge about what the difference in repair rates is, this is left for future work.

The existing methods reviewed in Section 1.1 for integrating condition information in power system reliability analysis are all based on different sets of condition data and applied to a specific set of components. In the lack of component condition reference data sets that could be used to benchmark our method, scenario 0 serves as the benchmark for this case study. However, in Section 4.5 we also present additional scenarios designed to understand the impact of key input parameter uncertainties.

4.1 Test system

The network model considered for the case study is a 25-bus test system that represents a power system with four distinct areas. The network model is displayed in Figure 5. This test system represents small regions of the Nordic power system, and it has been developed and used for integrated power market and reliability analyses [25, 31]. Actual generation and demand for the different generator units and delivery points vary with the different operating states and are a result of power market simulations [25]. Data for the 208 operating states used in this case study are given in [35]. The system includes eight power transformers for which condition-dependent failures will be integrated in power system reliability analysis in the following. Where not otherwise stated, input data from [31] are used in the case study.

Details are in the caption following the image
Test network considered in the case study (adapted from [31])

4.2 Transformer failure model

To establish the transformer failure rate λ W ( τ ( H I ) ) ${\lambda _W}( {\tau ( {HI} )} )$ from Equation (5) it is necessary to establish a health index model H I $HI$ , a relationship between the health index and apparent age τ ( H I ) $\tau ( {HI} )$ , and a probability distribution f w ( τ ( H I ) ) ${f_w}( {\tau ( {HI} )} )$ . To illustrate this, the transformer model in [7] is adopted. The main features of the model are presented here, for a detailed presentation the reader is referred to [7]. The model applies to the active part of the transformer (windings, core, and oil), because this is the part for which condition information is readily measured and collected by transformer owners.

The health index is calculated based on a set of condition data and is designed to meet the following criteria: (1) The health index reflects the transformer reliability and is both lower and upper bound, (2) Poor condition data are not masked by aggregation, that is, the health is never better than that indicated by the worst condition data. For details of the health index model, the reader is referred to [7].

The relationship between apparent age and health index, τ ( H I ) $\tau ( {HI} )$ , is established from nationally collected data for scrapped transformers. The relationship between health index and age is found by fitting a sigmoid function to the data, and the apparent age is found by inverting this sigmoid function. Furthermore, we assume that, in the future, apparent age and calendar age develops equally fast. The assumption means the derivative in (5) is equal to 1.

The general wear-out failure time distribution f w ${f_w}$ is established by fitting a normal distribution to the potential lifetimes extrapolated from the investigation of the scrapped transformers, and the estimated parameters are μ = 60 $\mu = 60$ years and σ = 18 $\sigma = 18$ years. Note that the results from [7] must be used with caution, as the data material used in [7] is limited. Nevertheless, they are well suited for illustrating the case study here.

4.3 Transformer data

To populate the test system with realistic transformers, and use the above transformer failure model, real transformer condition data is needed. Data for a set of 18 Norwegian transformers is studied in [7]. These data include sufficient information, and eight of these transformers are selected for the test system. To investigate the importance of accounting for component condition, the transformer in the worst condition has been assigned to the branch in the test network that has the biggest contribution to annual ENS (branch 29). The other transformers are arbitrarily assigned.

To establish the failure rates for the eight transformers in the test system using the model in Section 3, some statistical data is needed. Failure data for power transformers in Norway are collected both in the FASIT database [27] and in a separate database run by the user group for power transformers in Norway. The data from these databases are not publicly available, but a preliminary unpublished analysis indicates an average failure frequency of ω s = 0.0044 ${\omega _s} = 0.0044$ , including all transformer parts and failure types. The fraction 𝛾𝑤 of the total failure rate related to wear-out of the active part is not explicitly given by the data in the databases, and it is not always evident which failures are caused by wear-out, but a preliminary analysis based on data reproduced in [35] suggests γ w = 0.12 ${\gamma _w} = 0.12$ . These parameters are similar to international statistics as provided by CIGRE [36]. Furthermore, the rate of preventive retirement in Norway is not known, but data from another non-public database indicates that the average transformer age of the population has been stable at around 30 years the recent years. This means that in a period of 30 years, all transformers are replaced. Neglecting the small failure rate, we can hence roughly estimate the retirement rate to about λ P M = 1 / 30 0.33 ${\lambda _{PM}} = 1/30 \approx 0.33$ per year. Finally, the set of 18 transformers from [7] is assumed to be representative of the Norwegian transformer population, and the wear-out failure model is calibrated by using this set of transformers as described in Section 3.3. Applying Equations (10) and (11) results in the calibration factor β = 0.12 $\beta = 0.12$ . The statistical parameters are summarized in Table 1.

TABLE 1. Parameter values used in Scenario 1 of the case study
Parameter description Parameter Value
Overall historic failure rate ω s ${\omega _s}$ 0.0044
Rate of preventive retirement λ P M ${\lambda _{PM}}$ 0.033
Proportion of failures related to wear out of the active part of the transformer γ w ${\gamma _w}$ 0.12
Calibration for wear-out failures to observed failure rate β 0.12

4.4 Results for energy not supplied

To analyze the importance of accounting for conditions, the reliability of supply indices for scenarios 0 and 1 are calculated and compared. In this section, we focus on the energy not supplied index, and then other reliability of supply indices are discussed in Section 4.6. Before running the reliability of supply analysis, the failure rates of the transformers are calculated and calibrated as described in Sections 3.3 and 4.3. The result of the calculation is displayed in Figure 6. The horizontal line shows the average failure rate ω s ${\omega _s}$ which is used for all transformers in scenario 0 and used to calibrate the failure rate in scenario 1.

Details are in the caption following the image
Failure rates for the transformers used in scenario 0. The failure rate is split in 2 components as described in Section 3

In the present case, preventive retirement reduces the wear-out failure rates by only 1.5%, while calibration has a larger effect and reduces them by 88 %. The first is due to the short time horizon of only one year in the analysis. Preventive retirement will have a much larger effect on the failure rates when considered over a longer time horizon. Preventive retirement of transformers has been a common strategy in Norway for many years. This is an important reason why the observed failure rate in Norway corresponds to a mean time to failure (∼200 years) that is far longer than the typically observed lifetime of transformers (up to 60–70 years). Reasonable explanations for the calibration factor may be that the data set used to estimate the life distribution in [7] is biased, the 18 transformers in [35] are not representative of the transformer population, or that there are additional measures that the operators take to prevent functional failure and that such measures are triggered by information not captured by the current lifetime model. The latter would imply a kind of censoring of wear-out failures.

The results of the reliability of supply analysis are summarized in Figure 7. The overall ENS increases by 1.23% from scenario 0 to scenario 1. Five delivery points see a decrease while two see an increase in ENS. As shown in Figure 7, the delivery point at bus 30020 sees an increased ENS of 6.5 %. This is to be expected as this delivery point is most affected by outages of the transformer with branch ID 29, which was given the worst condition. Note that we have only modelled condition-dependent failures for transformers. The difference between scenarios 0 and 1 will be greater if condition-dependent failures are modelled also for other components in the test system.

Details are in the caption following the image
Annual energy not supplied in proportion to scenario 0, where the colours show different delivery points (left). The relative difference in annual energy not supplied on each delivery point as compared to scenario 0 (right)

4.5 Sensitivities for energy not supplied results

The parameters in Table 1 are uncertain and to investigate the importance of this uncertainty, two additional scenarios are constructed. This makes a total of four scenarios as summarized in Table 2. As the scenarios correspond to different values of the parameters γ w ${\gamma _w}$ and β, we use the parameter values to specify the scenarios as shown in Table 2. The table also contains a qualitative description of each scenario and the relative increase in estimated annual ENS as compared to scenario 0.

TABLE 2. Summary of the 4 scenarios used to demonstrate the proposed methodology
Number γ w ${\gamma _w}$ β Description Relative difference in annual ENS (%)
Scenario 0 0 0 This corresponds to using an average failure rate independent of condition. 0
Scenario 1 0.12 0.12 This corresponds to applying the statistical data to scale the proportion of mid-life failures and calibrate the overall failure rate to match the overall historical failure rate. 1.23
Scenario 2 1 0.73 This corresponds to assuming all transformer failures are due to wear-out failures of the active part. Calibration is employed 10.5
Scenario 3 0.12 1 This scenario serves two purposes; 1: Quantify the importance of correctly predicting the average failure rate. 2: Illustrate a situation where the condition of transformers is significantly worse than the national average. 57.0

Scenario 2 is used to quantify the importance of using an accurate estimate of the proportion of wear-out failures γ w ${\gamma _w}$ . A value of γ w = 1 ${\gamma _w} = 1$ is chosen as this is the largest possible value, and thus provides an upper bound on the influence that this parameter might have on the results. The overall annual ENS increased by 10.5%

Scenario 3 is similar to scenario 1 but this time the failure rate predicted by the transformer model is not calibrated to statistical data. This scenario serves two purposes. Firstly, it quantifies the importance of correctly predicting the average failure rate and thereby the importance of including calibration. Secondly, it illustrates a situation where the condition of transformers is significantly worse than the average condition of transformers in the Norwegian population. The result is that the annual energy not supplied increased by 57.0%. The results for all four scenarios are compared in Figure 8.

Details are in the caption following the image
Annual energy not supplied in proportion to scenario 0, where the colours show different delivery points

4.6 Results for additional reliability indices

The reliability of supply analysis also produces results for other indices than the ENS index considered above [24]. We investigated the results for the indices unavailability (annual interruption duration) Ua, annual interruption frequency λa, and the average interruption duration ra. The trends for Ua and λa are visually indistinguishable from those in the plots for ENSa in Figures 7 and 8. Plots for Ua and λa are therefore not included here. Neither are the results for the ra index since the values for this index are almost identical across the four scenarios.

We do however include results for the expected annual cost of energy not supplied (interruption cost) index CENSa to illustrate the interplay between the criticality of the individual loads and the condition of the individual transformers. To do so we consider a variation of the data set in [31] where all delivery points are assumed to be residential except for the delivery point at bus 30020 which is assumed to be industrial. With the interruption cost data in [31] This results in CENSa values as shown in Figure 9. For this case, it is more important to account for the component condition when considering the costs of energy not supplied than when considering other reliability indices. The reason is that the most critical delivery point is the one most affected by outages of the transformer with the worst condition.

Details are in the caption following the image
Annual cost of energy not supplied in proportion to scenario 0, where the colours show different delivery points

5 CONCLUSIONS AND FURTHER WORK

The paper presents a methodology to integrate a condition-dependent component probability of failure model into the reliability of supply analysis. The paper illustrates the integration of a specific model for transformers, but this may easily be extended to other components. The modelling framework in Section 3 applies generally, except for the division into active and non-active parts, which can be omitted for other components such as poles. The quantification of the modelling framework described for transformers in Sections 4.2 and 4.3 must be established for each new component.

A case study using statistical data for Norwegian power transformers shows that, in the Norwegian power system, the proportion of failures that are due to the poor condition is small. In a population where components at present are in good condition, the importance of taking condition into account will inevitably be limited in the short term. The sensitivity analysis shows the importance of adjusting the failure model to match representative statistical data. The results also show that if the condition of transformers is worse than in Norway, the impact of poor conditions could be significant. This is especially important with respect to strategic decisions such as long-term renewal planning. Further work is, therefore, ongoing to extend the methodology to longer analysis horizons and more comprehensive modelling of preventive measures and asset management strategies. This will enable prediction of the decrease in reliability of supply over a time horizon during which the transformers are expected to degrade significantly, and assessment of appropriate measures that can be taken to counteract this, such as preventive retirement.

In this case study we for the lack of more detailed data used the same outage time for both mid-life failures and wear-out failures. The methodology does however allow for distinguishing the outage times for the two failure types and investigating the impact of longer outage times expected for wear-out failures. A natural extension of the proposed methodology is incorporating a more detailed and dynamic model for the outage time that can account for its dependence on component condition, time, location etc. This will increase the usefulness of the methodology in asset management, for example, spare parts logistics, and strategic planning.

Extending the methodology to longer time horizons requires methods to forecast how the component condition develops over time. To account for this, an alternative to the analytical methods considered here is to use Monte Carlo methods to sequentially simulate the condition and functional state of the component as a function of time. The simulation may include failure events, maintenance and other asset management measures, or other influencing factors. The simulated sequences (i) may be part of a sequential Monte Carlo reliability of supply analysis [2, 22], or (ii) may be used to calculate Monte Carlo estimates of the component failure rate or unavailability as a function of time. For the former approach, a first step has already been presented in [30]. There it was used to validate the analytical approach in the present work, and it will in the future be extended to consider different asset management strategies. For the latter approach, the Monte Carlo simulation would replace the method described in Section 3 to provide input to the analytical reliability of supply analysis, keeping the other parts of the methodology described in this paper unchanged.

Introducing condition-dependent failure rates inevitably introduces more parameters in the model, each of which is associated with uncertainty. Including this uncertainty may be of great importance [5]. On a longer time horizon, the uncertainty in estimated reliability indices will increase, and it becomes even more important to quantify it. Further research is needed to quantify the uncertainty of the component reliability model and to find ways to propagate this uncertainty through the analysis.

In conclusion, this work has demonstrated steps towards better utilization of available information related to component conditions and failures in the reliability of supply analyses. Doing so also highlights relevant factors to consider (preventive replacement, future renewal and maintenance strategies, duration of forced outages etc.) and gives insight into remaining methodological challenges and data needs to further improve decision support for power system development and asset management.

AUTHOR CONTRIBUTIONS

H.T.: Conceptualization, Formal analysis, Investigation, Methodology, Software, Visualization, Writing - original draft, Writing - review & editing; J.F.: Conceptualization, Formal analysis, Investigation, Methodology, Resources, Validation, Writing - original draft, Writing - review & editing; I.B.S.: Conceptualization, Data curation, Methodology, Project administration, Validation, Writing - original draft, Writing - review & editing.

ACKNOWLEDGMENTS

The research leading to this publication received funding from the research council of Norway under Grant 308781 (“VulPro”) and in part by Statnett (the Norwegian Transmission System Operator), Landsnet, and Norwegian Water Resources and Energy Directorate.

    CONFLICT OF INTEREST

    The authors have no conflict of interest to disclose.

    DATA AVAILABILITY STATEMENT

    The data that support the findings of this study are openly available in Zenodo at doi:10.5281/zenodo.6127968