Islanding detection in distributed energy resources based on gradient boosting algorithm

This paper proposes a novel passive-based intelligent method for anti-islanding. Passive methods generally suffer from the improper tuning of threshold values for measured variables and false detection when the active or reactive power mismatch is small. Conversely, intelligence-based methods highly depend on the choice of an appropriate model, the uni-versality of data and selected features. In addition, the risk of overﬁtting and underﬁtting for a single model due to an improper selected feature or insufﬁcient data is always possible. This condition makes a single classiﬁcation model unreliable for practical purposes. The method used in this work is a sequential ensemble of intelligence-based models called gradient boosting. The sequential form of an ensemble is substantially designed to solve the underﬁtting problem. In addition, by selecting the best combination of features and ensembles, algorithm overﬁtting can be prevented and the attribute data space can be constructed with maximum discrimination. Furthermore, this condition yields a model with a zero non-detection zone. As a result, a multi-objective optimisation method called the non-dominated sorting genetic algorithm-III is used to tune the model. The reliability and economy of this method for distribution networks with various types of inverter-based and synchronous distributed energy resources are demonstrated.


INTRODUCTION
The increasing penetration and integration of distributed energy resources (DERs) and the complexities in their operating modes cause novel difficulties to emerge in modern power systems. This ever-increasing growth poses problems for the economic, safe and stable use of DERs to supply loads to new types of networks, such as microgrids. Modern DERs can operate with a power system or solely with various control modes. Hence, it is necessary to detect unintentional islanding and switch between operational or control modes on time to maintain the stability of systems. Unintentional islanding is a condition where one or more DERs remain connected and feeding one or more local loads while they are disconnected from the main power system. Some of the many threats and security problems posed by this condition are as follows.
on their connection or disconnection, the islanded region is detected, and control signals are sent for post-fault operation of the islanded region. The remote methods have zero non-detection zones (NDZs), high reliability, degradation-free power quality and applicability to multi-DER systems. However, these methods are complicated. In addition, vast infrastructure changes and high operational costs are indispensable. Examples of remote IDMs are introduced in [1] and [3].
In local IDMs, all measurement, detection and decisionmaking are done near the utility. These methods are mainly classified into three groups: active, passive and hybrid methods. Active methods have a very small NDZ. However, their main drawback is that they complicate and degrade the control system of the DER, leading to a longer detection time and power quality problems due to continuously injecting a perturbation to the grid. In modern power systems, DERs can operate in a post-islanded condition. For post-islanding operation of DERs, it is necessary to change between control modes, and the system is more susceptible to disturbances. Thus, these methods are not a good choice because they lead the islanded part to instability and avoid post-island operation and control of the network. Examples of active methods are introduced in [1] and [4]. On the other hand, passive IDMs are economic, and they have a fast detection speed and no effect on power quality in either grid-connected or islanded operation. However, these methods suffer from a large NDZ and are sensitive to the adjustment of threshold values [1,3]. To overcome the drawbacks of traditional local methods, hybrid IDMs are suggested. The general procedure for islanding detection with these methods is to implement passive methods unless a suspicious condition is detected and activates active methods [5,6]. Hybrid methods can mitigate power quality and NDZ problems, but the detection time worsens in some cases and complicates the DERs' control systems, particularly the inverter-based ones. Nonetheless, the risk of instability for the post-islanding condition remains.
To attain a secure and economic islanding detection, adapt it to contemporary inverter-based DERs technology and avoid the aforementioned deficiencies of the traditional methods, various manipulations are performed in local methods. In [7], parameters are transformed to the space vector domain before setting the threshold values. However, the following method depends on the type of DER and is applicable for synchronous DERs. In addition, it depends on extracting the hidden characteristic of system variables using digital filters. In another study, the set point of the rate of change of frequency (ROCOF) relays is adjusted using alternative methods, such as prior knowledge of the power curve and the study of all power imbalances of DERs [8]. This method also suffers from NDZ to a great extent, and it depends on daily load curves. In [9], a harmonic tuned filter was installed at the DERs to extract a definite harmonic order of impedance, and in [10], the second harmonic voltage of the inverters was extracted for islanding detection. However, these methods depend on setting a predefined threshold for islanding detection and on the network size and topology. Methods such as deviation of the oscillation frequency are suggested in [11]. This paper uses a chaotic theory oscillator called the Helmholtz oscillator, which has great discrimination between chaotic and normal signals. The dynamic behaviour of loads and the derivative of their equivalent resistance in a microgrid are used as an IDM in [12]. These methods can detect islanding with higher accuracy, but their main drawbacks are multiple settings of thresholds, which are highly affected by noise and delays in detection time.
Subsequent efforts have focused on intelligent methods and their combination with traditional active, passive or signal processing methods [13][14][15][16]. The outstanding feature of intelligence-based data mining (IBDM) methods is that the thresholds are not set manually. The models learn to adjust themselves according to data. However, these methods intensely depend on the universality of data, measured values (features) and possible training scenarios. IBDM methods are burdened with more costs as the amount of data-gathering equipment increases, and classifiers, such as a decision tree (technically called weak learners), usually have high classification errors in a nonlinear data space. Therefore, signal processing methods are introduced in [17][18][19][20] to extract outstanding features of data and minimise the use of measurement devices. Strong classification methods, such as support vector machines (SVMs) [21] or extreme learning machines (ELMs) [22], are used to classify attribute data space more precisely. These solutions increase accuracy substantially, decrease the NDZ dramatically and shorten the detection time. However, their programming and processing are complicated and entail a high risk of overfitting due to false measured data. This situation makes these methods inapplicable for practical purposes.
As a result, this paper introduces an ensemble learning method called gradient boosting to construct a strong model from weak learners, such as a decision tree. This classifier is easy to programme, and it demonstrates the performance of a strong IBDM model, such SVM or ELM. Moreover, sensitivity to false measurements, noise, the NDZ and overfitting have been progressively minimised using a novel, multi-objective optimisation method called the non-dominated sorting genetic algorithm-III (NSGA-III) [23]. NSGA-III also helps to detect easily and deterministically all possible models that have the following characteristics: 1. An optimal number of the features and a minimum number of measurement devices. 2. An appropriate construction of ensembles for minimizing the processing time and NDZ, and feasibility for programming in a local protection system (minimum complexity). 3. Select the best sliding time window. 4. Detect a minimum number of additional features that can minimise NDZ to a desirable level.
Total ensemble methods are advantageous primarily because they attempt to fit the model to the data step-by-step, and only a few integer parameters must be defined. They are easier to programme and demonstrate the same performance as conventional strong classifiers. The NSGA-III helps to optimally define these parameters so that the total model selects the best feature space with minimum generalisation error and robustness to overfitting.
The remainder of this article is as follows. Section 2 introduces the topology of gradient boosting and the tuning of NSGA-III for optimal feature selection. Section 3 develops the proposed method for the power system. Section 4 describes the test system. Simulation results are presented in Section 5 and compared with those of previous IBDM methods in Section 6. Finally, in Section 7, the conclusion is drawn.

NSGA-III-BASED TUNING GRADIENT BOOSTING ALGORITHM
The intelligence-based IDMs provide a fast detection time and high accuracy. These methods are technically classified as weak (basic) learners, such as the decision tree and k-nearest neighbour methods, or as strong learners, such as the SVM and ELM. Weak learners are easy and straightforward to tune and train. Unfortunately, they classify attribute data space linearly and underfit and demonstrate poor performance for nonlinearly dispersed data. Intelligence-based islanding detection with weak learners decrease the NDZs compared to traditional methods. However, because of nonlinearity and the complexity of the power system's feature space, these methods still leave considerable NDZs, particularly in inverter-based DERs [15].
In contrast, strong learners can classify attribute space nonlinearly and more flexibly. High accuracy may be achieved by well-tuning the parameters and coefficients of the model. However, their proper tuning is controversial and not straightforward. In addition, concentrating on adjusting these parameters to increase the test data accuracy increases the risk of overfitting. Therefore, they are impractical for real system implementation.
Altogether, weak and strong classification methods only train a mono-model for classification. If this model is trained improperly because of an erroneous measurement or outlier data, the entire decision is affected adversely and fails. According to these results, intelligence-based models with a mono-classifier are not sufficiently reliable for practical purposes.

Gradient boosting
Here, instead of using a mono-classifier, an ensemble of classifiers is implemented. In ensemble learning, instead of using a mono-classifier, a group of models participates in the training and decision-making process. The main goal is to iteratively and, if necessary, by adjusting the training speed, suppress the decision power of the models with a high prediction error and empower the correctly predicting models. In ensemble learning, the decision is taken by a majority vote of the models to minimise the error rate. These methods are generally classified into two categories: bagging and boosting. In bagging, the main goal is avoiding overfitting, and every model is trained separately by randomly selected parts of the training data. Then, the average vote of the data is used for the decision. On the other hand, in boosting methods, the main goal is to construct a strong model from weak learners. The total model helps to solve the underfitting problem of weak learners and is sufficiently strong to classify the data space nonlinearly. Moreover, it is straightforward and less complex for tuning and training. Boosting methods are trained in a sequence, and in every iteration, compensation for the mistakes of the previous models is attempted (Adaboost) or the classification error is slowly (step-by-step) decreased (gradient boosting). Adaboost [24] and gradient boosting [25] are two outstanding ensemble boosting methods. In Adaboost, all data has a weight factor, and in every iteration, the weight factor of wrongly classified data increases, while those of the others decrease. Adaboost cannot demonstrate good performance in a complex data space or in the presence of incorrectly classified data and outliers. In every iteration, the model progressively concentrates on classifying these data. In addition, it is very sensible to the selected features of attribute space, and it overfits in some feature spaces.
In gradient boosting (see Figure 1), instead of emphasizing misclassified data, the classification error for all data is defined as a function to minimise in every iteration. As stated in Table 1, first, a simple prediction model F 0 (x) is defined as the probability of data belonging to a class. The goal is to minimise a differentiable loss function (the mean square error) using an optimisation method called the gradient decent algorithm (a first-order iterative optimisation algorithm for finding the minimum of a differentiable function). In every iteration (round), the distance between the prediction and class of data is considered an error rate of the model. These errors are used to calculate the gradient, which is basically the partial derivatives of the loss function (L(y, F ) = 1 2 (y i − F i (x)) 2 ). This procedure obtains the residues of the model (y . It helps to find the steepness of the error function, which can be used to determine the direction for changing the model parameters to maximally reduce the error in the next training iteration (round). In the next step, a regression weak model (a regression tree) h i (x) is fit to the residues to achieve a model with lower error. This procedure continues iteratively until the optimisation process reaches a point with minimum error, and the residues become close to zero (or until an appropriate number of rounds are completed). Therefore, the algorithm reaches the minimum of the loss function, and movement in any direction will not improve the performance.
Notwithstanding all the advantages of gradient boosting over traditional strong classifiers, it is yet a strong model, and it is greedy to learn training data in every iteration. This, in turn, can cause the model to construct complex regression trees with high depth, particularly when the models has many features. Thus, to increase the accuracy and minimise the overfitting risk of gradient boosting, in every iteration, the residue is multiplied by a shrinkage parameter (0 < < 1). This parameter causes the model to concentrate to compensate for a portion of the total error in every iteration. Therefore, misclassified data with a large effect on the error function is overwhelmed. However, a very small shrinkage parameter can decrease the convergence speed of the model, so a trade-off is necessary. The pseudocode of gradient boosting is as follows:

NSGA-III-based optimisation of gradient boosting
Selecting data space is an important part of constructing an intelligence-based model. Data space is a combination of the features (attributes) that help intelligent methods to learn and classify data. A suitable data space is where there is maximum discrimination between data of every class and an optimum number of the features to avoid complexity. In addition, the risk of overfitting is minimised in this space. Using ensemble learning methods, a strong classification model can be constructed only by training a group of basic models. These models are easy and straightforward to train and programme on local measurement devices because they only contain a group of comparison trees (regression trees). Unlike an SVM or ELM, they do not have parameters that can strictly affect the total results through small changes. Moreover, in gradient boosting, the model emphasises minimizing the entire classification error instead of attempting to classify misclassified data. Therefore, the performances of highly overfitted models are suppressed by every learning iteration (because of erroneous measurement and outliers), and their effect on the entire result is minimised.
Gradient boosting can be constructed by adding every weak learner until the error stops decreasing. In addition, in every iteration, weak learners are proposed to minimise the loss function as much as possible without any limitation on increasing the depth of the regression trees. This greedy performance for the gradient boosting can increase the risk of overfitting, specifically in a large and complex data space. By selecting the proper values of the parameters and the optimum combination of features (the best data space), it is possible to remove the overfitting probability and to remove the NDZ in islanding detection altogether. A good constructed gradient boosting with a proper data space can positively affect the performance of the model. Here, a multi-objective optimisation algorithm called the NSGA-III   Figure 2) is used to optimise the performance of the gradient boosting.
Compared to previous versions, the main property of NSGA-III is a series of reference points that help to extract the Pareto front (groups of optimal answers with no superiorities to each other) more homogenously, specifically in data with high dimensions. These points help to improve the diversity and convergence of NSGA-III and boost its performance for more than two objective problems. The algorithm begins with an initial feasible solution (initial population) and a series of widely separated reference points. These points are placed on a normalised hyperplane using the systematic approach of Das and Dannis [26]. This method helps to add every reference point equally inclined on an objective axis and has an intercept for every pair of the objective axis. The number of divisions on every axis is selected by the user, and it is important to compromise between the speed and diversity of the problem. In this work, p = 4 divisions are selected. Therefore, for an M = 3 objective problem, the total number of reference points is In the next step for maintaining the diversity of the problem, every member is associated with reference points evenly. Mutation and crossover are applied, and the elite members are selected and sorted for the remainder of the algorithm. Then, using a nondominated sorting method, members with equal feasibility in the solution are sorted as fronts (F 1 , F 2 , …). The new normalised populations are associated with reference points by the shortest perpendicular distance between them and the FIGURE 2 NSGA-III multi-objective optimisation flowchart reference points. Finally, using a niche preservation strategy, new reference points are added to existing ones and the reference points with zero associated members are omitted.
The new normalised populations are associated with reference points by their shortest perpendicular distance to these points. Finally, using a niche preservation strategy, new reference points are added to existing ones and the reference points with zero associated members are omitted.
In power systems, it is possible to select various types of measurements as features and define an attribute data space. An optimisation-based feature selection helps to find an attribute space (the best combination of features) where there is maximum discrimination between data of every class. In addition, there is a vast amount of selection of the ensembles. It is essential to define the number of learning rounds (iterations) or the total number of base learners. Furthermore, it is necessary to determine the maximum depth of the regression trees and shrinkage parameters. In selecting these conditions, a compromise between the accuracy of the testing and training results and the complexity of the model (the least complex model) is needed. It is obvious that there are multiple optimum answers that have no superiority to each other. In addition, optimizing one condition contradicts optimizing another condition. Here, a multi-objective-based optimisation is suitable for detecting all optimal models. NSGA-III is a metaheuristic, multi-objective optimisation algorithm selected for reaching these goals.
Gradient boosting and NSGA-III are coded in the MATLAB R2018b script environment, and to construct gradient boosting, MATLAB binary regression tree models are used.

PROPOSED METHOD
Local IDMs are advantageous mainly because all measuring, processing and detecting are performed near the utility. They do not need infrastructures for transferring all the DERs data to a central processing system. They are fast and economical, and different groups of measurements may be selected to construct the attribute data space. Then, a proper feature selection (optimisation) method detects the minimum number of features in feature space, where there is maximum discrimination between data of each class (the best feature space).

Feature selection
In IBDM, many measured and calculated (processed) features may be selected. In the literature, various combinations of these features are selected [9,13,15,16,21,[27][28]. However, no explanation is provided about the optimum combination of these features and why their selection is suitable, or if there is dependence between these features and the type of DER. In addition, no explanation is provided about the possibility of boosting the performance of the model or decreasing the NDZ by adding one or more features, and what these features are. This paper demonstrates that, if there is a sufficient number of features, NSGA-III can extract the best combination of  Therefore, based on the economic condition and the expected reliability and accuracy of the model, selecting between the features is possible. Here, to prove the capability of the method in selecting the appropriate combination between large amounts of the attributes, the NSGA-III feature selection process is performed according to the feature sets in Tables 2-4. However, in practical applications, the feature selection by this method can be performed in a subset of these features according to the measurement devices of the DERs and the calculated features.
In this work, signals with limited time length are utilised around the disturbance. However, in real-world application, the method is utilised for non-stop continuous electrical signals discretised with a specific sampling frequency. Therefore, for online application, a sliding time window is necessary to handle limited number of samples. In this work, for every type of DERs, it is tried to select the best sliding time window in accordance with number of features. To extract these features, the network is simulated until 15 periods after the occurrence of the islanding disturbance, and features are extracted or calculated. In this work, an answer is sought to the question of how long data gathering is optimum for a specific DER. A compromise between the amount of time and the amount of features is sought. Therefore, at first, a two-period time window after the disturbance is selected and the algorithm is implemented to extract the best combination of features. Then, the procedure is repeated for a three-period time window and so on until 15 periods after the islanding. In this way, a time window with a more optimum feature set may be found. Some of these features are extracted by measurement devices equipped near the DERs and in the high-voltage side of the transformer using MATLAB R2018b software. These measurement devices measure the instantaneous voltage, instantaneous current, RMS voltage and current, instantaneous and mean value of the active and reactive power, ROCOF and rate of change of power. The frequency is extracted by the phaselocked loop block of inverter-based DERs. In addition, other features are generated by processing these features. These processed features, such as the frequency to active power change or deviation, are extracted by processing these signals [26]. Signals with a deviation index are calculated as the difference between the maximum and minimum of the signal in the sliding time window. Furthermore, the rate of change of the signals is calculated in the time window, and then, their maximum value is used as the training data of the model. The signals in feature set 3 and their first-and second-order derivatives are sampled with a specific sampling frequency (64 samples per period in a 60-Hz system). In a predefined time, the first-and second-order discrete derivatives of a signal are Finally, simple features, such as the mean value, energy and standard deviation, are calculated for all features as follows and used in the training of the models: Other features (feature set 2), such as the total harmonic distortion (THD) of the voltage and current, voltage phase and phase difference between voltage and current [21,27] (extracted by phase measurement units near the DERs), are also considered eligible for islanding detection. This approach is attempted to avoid complex signal processing techniques, such as the wavelet or Hilbert-Huang transform, to make the model applicable and feasible for local islanding detection devices. In total, 84 features are simulated and extracted by measurement blocks to select the best NSGA-III feature space.

Algorithm implementation
For accurate anti-islanding, three objective functions (OF) are used in NSGA-III to select the best models, which are robust to overfitting and exhibit a maximum discrimination between classes. The first OF is the minimum number of features (N feature ) for selecting minimum measurement devices and considering economic issues. The second OF is for minimizing the complexity of models to programme; the maximum number of splits (N split ) (the maximum depth of the tree) in regression trees and the total number of base learners (N base_learner ) (the number of learning rounds) should be minimised. However, minimizing these values should not deteriorate the gradient boost performance. For this, in the third objective function, the training error (Train err %) and test data error (Test err %) must be minimised (to compensate for the performance of the first objective function). In addition, to decrease the data sensitivity of the constructing model, the average loss of the five-fold cross validations (CV Fold _loss ) should be minimised (to compensate for the performance of the second objective function). One of the main properties of the overfitted model is its low training error and high testing error as well as its high crossvalidation loss. Hence, by minimizing these properties in the third objective function, the selection of models that are not overfitted is guaranteed. In addition, when the cross-validation loss is minimised, the sensibility of the model to new unseen data is minimised. This, in turn, reinforces the zero NDZ capability of the model.
Thus, the NSGA-III objective functions are Consequently, the NSGA-III algorithm can find all possible solutions, in which optimum attribute space (features), minimum complexity, minimum error and robustness to overfitting are granted.
Moreover, inputs (members of the population) of the NSGA-III are a binary vector of the selected or non-selected features, the maximum depth of the tree and the learning round. Every member is evaluated in the NSGA-III cost function, and in every evaluation, measured features are selected and extracted to execute gradient boosting, cross-validation and other objective functions. The evaluation process of the NSGA-III finds the Pareto front (the best results of the objective function) for further study to select the appropriate model based on the economic and safety conditions of the power system (see Figure 3).

TEST SYSTEM DESCRIPTION
The test system [15] in Figure 4 is used to validate the results. To check the independence of the proposed model from the DER type, various types of DERs are modelled in the network. In this system, two wind farms are in the first feeder. The first DER is a doubly fed induction generator (DFIG_wind) with a rated value of 6.6 MVA and 575 V. It is a rotating machine DER that is partially connected to the network by an inverter through the rotor and directly connected to the power system by a stator. The second DER is a synchronous generator (SG_wind) type 4 fullscale converter with a rated value of 6.6 MVA and 575 V. It is a rotating machine DER with full connection to the power system through an inverter. These models are equipped with a battery energy storage system (BESS) in the DC terminal of the inverters [29]. The feeder is energised with a 12.5-MW rated load and 25-kV line voltage, and the frequency is 60 Hz. Both wind farms are simulated in the voltage and power control mode scenarios, while the BESS is active or inactive. Both turbines are tested in the full loading, half loading and min loading conditions. The second feeder is equipped with a 500-kW, 260-V photovoltaic (PV) farm (an inverter-based nonrotating DER) operating in the power control mode. Finally, a 3.125-MVA, 2.4-kV synchronous diesel generator (Diesel) is modelled in the second feeder. This feeder is energised with a 3.5-MW load and a 25-kV, 60-Hz line voltage. The substation transformer is a 120/25-kV model. All transformer rated capacities and connections are depicted. The test system is simulated in MATLAB R2018b software, and the features are measured by simulation.

Simulation scenarios
Anti-islanding involves using control systems or relays to prevent the continued occurrence of unintentional islanding. Based  [30], the DER should detect and cease to energise the islanded area within 2 s. In addition, according to IEEE Standard 929-2000 [31], in PV systems, if a mismatch between generation and consumption is more than 50% or the islanding power factor is less than 0.95, the DER should cease islanded EPS in less than 10 cycles. In the first generation of the DERs, when a fault occurs near these DERs, energisation of these DERs must cease regardless of whether they are islanded. This is because the control systems cannot maintain the DERs stability after the clearance fault. There are no control systems for a low-voltage ride through (LVRT) protection embedded in these DERs. However, in the next generation of networks, ever-increasing penetration of the DERs in distribution networks is the motivating idea behind bidirectional power transfer between the main power network and local networks. Disconnecting DERs after any disturbance can lead to instability problems in power systems. Therefore, the LVRT control systems, such as crowbar protection for intense and symmetrical faults and dual vector protection for unsymmetrical protections, are used in the DERs [32]. These novel control systems definitely affect the transient behaviour of the measured signals that are used as features. Furthermore, in contemporary networks, the DERs are operated as parts of the microgrids. Therefore, not only grid connected parts but also islanded parts must continue operating and supplying the loads.
In addition, at least one of the DERs must switch to voltage control mode or retain its voltage control mode to have a stable islanded area. Furthermore, every control mode has its own transient behaviour in the post-islanding condition. Thus, modelling the DERs with these details is attempted to have a realistic operation for their simulation.
In the contemporary variable-speed wind turbines that are used in this work, the power is fully or partially injected to the network using a back-to-back converter. Because of the indeterministic characteristic of the wind speed and the necessity of continuously supplying the loads, adding a BESS to the DC bus of the back-to-back converter is suggested. Here, to study these BESS effects in the measured transient signals as features, some of the features are extracted when the battery is active. Both wind turbines are equipped with the BESS in the DC terminal, and the simulations are performed in the active or inactive modes of both batteries. In Figures 5 and 6, two instances of these conditions and their active power and reactive power mismatch are demonstrated. It is seen that when the battery is active and islanding occurs, the rate of change in parameters is smoother and more stable. Consequently, islanding detection using traditional passive methods is deteriorated, and the model stays in the NDZ for a longer time.
According to the scenarios in Table 5, 857 events are simulated for every time window. In most of the islanded scenarios, first, the NDZ of the DERs is extracted [16]. Then, most of the islanded scenarios are simulated inside this NDZ and a few In every group of the events, 70% (572 events) of the data is used for training and 30% (285 events) is selected for testing. In addition, to detect the optimum time window (where the minimum number of features with maximum accuracy is preferred), the test system is simulated until 15 cycles after the disturbances. The features in every time window from the first two cycles, first three cycles etc., are extracted for the first 15 cycles. In fact, 14 (number of time window) × 4 (number of DER) = 56 different feature selection algorithms (NSGA-III) are executed.

SIMULATION RESULTS AND STUDY OF THE NDZ
NSGA-III-based gradient boosting optimisation is tested for the DERs and Pareto fronts (best results of objective function) for all extracted time windows. An example of these Pareto fronts for the DFIG_wind in a 13-cycle time window is presented as a parallel plot in Figure 7. The results for all time windows are gathered, and only models with 100% accuracy are selected for further processing.
In the present simulation, and specifically in power system operations, the priorities for selecting the best model are as follows. After selecting models with 100% training precision in the time windows, the next priority is minimizing the number of measurement devices. Thus, a model with a minimum use of measurement devices (minimum cost) and maximum number of processed features is more suitable. For example, if possible, it is preferable to select the first-and second-order derivatives of the instantaneous active power instead of the instantaneous active power and voltage. The next step is the NDZ of the models. Based on the study of the mismatch between generation and consumption and maintaining network safety, a model covering all power mismatches and having zero NDZs is superior. Therefore, to generalise the model to the dataset and reduce the model dependency on network topology, an algorithm with minimum cross-validation loss is suitable. To test the NDZ of the proposed algorithm with different feature sets, first, the NDZs of the DERs are detected using the method in [16]. Then, various simulation scenarios, in addition to the testing and training data, are generated inside the NDZ with various active and reactive power mismatches and load quality factors. The trained models with 100% accuracy in the training and testing data are challenged by these data. Zero NDZ is observed in all of these models. Therefore, if NSGA-III models with high accuracy are selected, the system has zero NDZ, but more measurement devices are necessary to extract these features. A compromise between accuracy and economic factors must be made.
None of the selecting models deviate from the time limit of the standards. However, models with a lower response time are preferable. In Table 6, the optimum islanding detection model of the DERs in all time windows and the selected time windows of the DERs are highlighted. Moreover, in Table 7, the selected features of the best models obtained using NSGA-III are presented.
The results show that the fully inverter-based connected DERs need more complicated models to achieve desirable results. In addition, the response time of the models is very short and reliable. Multi-objective-based feature selection provides equally perfect results. Moreover, the backup models can be selected with a minimum difference in features along with the main model. On the other hand, a backup model can be selected with the same features as those of the main islanding detection model if NSGA-III provides a desirable one, even with lower training accuracy.

COMPARING THE RESULTS
In this section, our results are compared with those of traditional passive methods and intelligence-based methods. The main goal of this paper is to use methods that have a simple feature extraction procedure and avoid complex signal processing methods. In total, the compared intelligence-based methods are divided in three groups: The results are presented in Table 8. Furthermore, the feature selection process of this paper and the test network are tested and compared with the decision tree, Adaboost and ELM. The ) energy(Q) e n e r g y Diesel generator dQ dt std(Q) m e a n ( v − i ) decision tree is one of the straightforward machine learning methods, and Adaboost is another sequential ensemble of weak learner methods that enables us to classify data space nonlinearly. In addition, the ELM is a strong classifier with tuneable coefficients for nonlinear classification. The Fitcensemble and Fitctree functions of MATLAB software are used to test the Adaboot and decision tree algorithms. The ELM is programmed according to the algorithm in [33]. It is seen that, in the decision tree-based classification, there is always a high NDZ that can be fully removed. In addition, the crossvalidation loss, even for the best feature space, is below 94%. On the other hand, for some feature sets, the Adaboost algorithm overfits, and its accuracy on test data is very low (below 40%). This phenomenon generally occurs in datasets with many features and few measurement devices. Moreover, in the gradient boosting algorithm, for the same feature set, it is always possible to achieve a high accuracy with a lower total depth of the trees (splits) (lower complexity) in comparison to Adaboost. This condition becomes worse for Adaboost in inverter-based DERs, specifically when the battery is active. Finally, for ELM, tuning the proper coefficients and extracting the appropriate data space for these coefficients are very labourious procedures. Moreover, the algorithm is seen to overfit when attempting to exactly tune these coefficients for a specific value.

CONCLUSION
This paper has introduced a novel ensemble of intelligent methods for passive islanding detection called gradient boosting.
Intelligence-based passive methods are preferable because they decrease the detection time, avoid the manual setting of thresholds and are independent of the DER type. The main goal of ensemble learning methods is to construct a strong model from an appropriate combination of weak models. These methods are advantageous compared to a mono strong model because they have easy to adjust parameters, programme in local processing systems, avoid underfitting, reduce sensibility to data and enhance robustness to misclassified data. A multi-objective optimisation (the NSGA-III) is used to optimise the model parameters. It is demonstrated that models with zero NDZs and a minimum risk of overfitting can be constructed if there is sufficient data. Moreover, the constructed models have an optimum combination of the feature space and complexity of base learners. NSGA-III determines the most appropriate parameters based on the network topology and type of DER. It is demonstrated that more than one acceptable ensemble topology is in the attribute data space. This, in turn, aids in the selection of a model compatible with the available equipment and economic considerations. Eventually, these methods can provide a reliable, on-time islanding detection with a high accuracy and compliance with the standards.