Finite‐horizon optimal tracking control for constrained‐input nonlinear interconnected system using aperiodic distributed nonzero‐sum games

Funding information National Natural Science Foundation of China, Grant/Award Number: 61473147; Postgraduate Research & Practice Innovation Program of Jiangsu Province, Grant/Award Number: KYCX20_0204 Abstract This paper proposes a distributed adaptive dynamic programming scheme to investigate the optimal tracking control problem for finite-horizon non-linear interconnected systems with constraint inputs under aperiodic sampling. A N-player nonzero-sum differential game system is constructed with the presented non-linear interconnected system and the tracking error system by introducing the augment vectors. To address the problems of constrained-input and finite-horizon control, a non-quadratic utility function and a finitehorizon cost function are utilized which will arise in the time-varying Hamilton–Jacobi (HJ) equation. Then, a periodic event-triggered scheme is designed to realize aperiodic sampling, where the consumption of communication resources is reduced and the Zeno behavior is avoided. Under the designed periodic event-triggered scheme, the time-varying HJ equation is almost impossible to get an analytical solution due to its hybrid properties and non-linearity. Therefore, the critic neural networks are used to estimate the optimal solution of the HJ equation, and the weight update law is constructed to guarantee the uniformly ultimate bounded of approximated errors. Further, the hybrid nonzero-sum differential game is confirmed to be uniformly ultimate bounded by using the Lyapunov theory. Finally, the obtained distributed PET control strategy is successfully applied to dispose the missile-target intercepter problem.


INTRODUCTION
As a class of complex systems, the missile guidance systems can be modelled as the large-scale interconnected system that consists of several subsystems with interconnections. Note that the interconnection terms among subsystems propose a challenge in designing the stabilizing controller for interconnected systems, since the interconnection terms affect stability and performance of subsystems or even the entire interconnected system. Therefore, various control methods have been proposed to analysis and control the interconnected system, which included decentralized controller and distributed controller. As in [1,2], the decentralized control strategy enabled interconnected systems to guarantee stability only when the interconnection terms among subsystems are weak. To overcome the limitation of the decentralized control approach, a distributed control strategy was proposed to guarantee the stability and transient performance of interconnected systems with strong interconnections, where the controller was designed by using the system information of local and neighboring subsystems [3]. Although plenty of theoretical results for interconnected systems have been obtained, those results focused on stability analysis of the systems, with few on the optimal tracking control problem.
Note that the optimal tracking controller for non-linear systems is hard to construct through exact mathematical derivation, since the corresponding Hamiltonian function is non-linear and coupled. To approximate the performance index function and the feedback controller, the adaptive dynamic programming (ADP) technique was proposed by using two neural networks (NNs) [4]. The ADP technique has been applied successfully to strict-feedback non-linear systems [5,6], discrete-time non-linear systems [7,8] and nonlinear switched systems [9,10]. Very recently, the ADP technique was extended to derive the decentralised tracking controller for non-linear interconnected systems [11,12]. [11] used an observer-critic structure-based to reconstruct unknown system dynamics and solve the coupled Hamilton-Jacobi-Bellman (HJB) equation, respectively. In contrast to [11], the optimality was considered in [12], and tracking error subsystems were guaranteed to be asymptotically stable by using a model-free ADP algorithm. Later, the authors in [13] put forwarded a distributed optimal tracking controller to ensure that large-scale systems with disturbances and saturating actuators were uniformly ultimately bounded (UUB). Nevertheless, the above ADP algorithms were designed without considering the limitation of network bandwidth and computing resources.
Along with the improvement of guidance systems, the modern guidance laws with high precision demand more network bandwidth and data storage space. By and large, the timetriggered communication scheme used in literatures [14,15] will generate excessive redundant signals and even influence system stability. To reduce the consumption of communication resources, some aperiodic sampling schemes (ASSs), such as the continuous event-triggered (CET) scheme, the periodic event-triggered (PET) scheme, the self-triggered (ST) scheme and so on, were presented in [16][17][18]. Using these ASSs, the control strategies were updated only when the designed triggering conditions were breached, where the triggering condition was monitored continuously under the CET scheme [16], while the triggering condition was verified only periodically under the PET scheme [17]. Further, as in [18], under the ST scheme, the next triggering instant was precomputed based on previously received data and system dynamic knowledge. Thus, the ASSs can greatly reduce the consumption of communication resources as well as avoid unnecessary information transmission between system controllers and actuators. At present, the ASSs have been widely used to study problems of optimal control and tracking control. In [19], an event-triggered optimal control law for uncertain non-linear systems was constructed in terms of the ADP algorithm, where the actor-critic framework was used to approximate the optimal value function and control inputs. To simplify the ADP algorithm, critic NNs were used to approximate value functions, then the distributed optimal control strategies were proposed for non-linear interconnected systems [20]. Using the same framework as [20,21] developed an optimal tracking controller for constrained non-linear systems under the ASSs, but until now, there are few reports concerning the optimal tracking control problem for interconnected systems under aperiodic sampling, especially under the PET scheme.
In fact, the interconnected system with strong interconnections can be converted into a NZS differential game by designing augment vectors, where the controller design for each subsystem is regarded as a player in NZS differential game. Based on the NZS differential game framework, the authors in [22] proposed a distributed control strategy to guarantee the stability and transient performance of interconnected systems with strong interconnections, where the controller was designed by using the system information of local and neighboring sub- systems. The event-based distributed optimal control problem for the non-linear interconnected system was further investigated in [23], in which the cost function was constructed by using the overall system information. However, the problem of finite-horizon and input constraints are not discussed in these researches mentioned above, which cannot be overlooked in particular applications. Inspired by the aforementioned literatures, a distributed optimal tracking controller for the finite-horizon non-linear interconnected system with input constraints is designed under the PET scheme. At first, the non-linear interconnected system is converted into a NZS differential game. Then, critic NNs are utilized to approximate the optimal solution of the coupled HJ equation. Under the event-triggered scheme (ETS), the distributed optimal tracking control strategy are present to guarantee the UUB of the NZS differential game. The main contributions are threefold: (1) In contract to existing decentralized controllers [1,2], this paper develops a distributed optimal tracking control strategy for the non-linear interconnected system with strongly interconnected terms, input constraints and finite-horizon constraints.
(2) Compared with some existing literatures [16,18,24], the PET scheme is put forward to reduce resource consumption while avoid the Zeno behavior, where the triggering condition is verified only at a fixed periodic. (3) This paper presents, for the first time, a distributed PET scheme in an attempt to tackle the optimal tracking problem for the multi-missile cooperative guidance system.
The remainder of this paper is organized as follows: The problem descriptions and transformations are given in Section 2. In Section 3, a PET scheme is proposed to save communication resources. Under this scheme, the distributed optimal tracking problem for the finite-horizon NZS differential game with input constraints is analyzed, then stability criteria are derived to insure the UUB of the corresponding closed-loop system. Section 4 proposes the distributed optimal tracking control strategy by critic NNs, meanwhile the tracking error and approximation error are proved to be UUB. In Section 5, the multi-missile cooperative guidance system is applied to verify this paper designed algorithm. Finally, conclusions are formulated in Section 6, and notations utilized in this paper are shown in Tables 1 and 2.

Problem formulation
We consider the non-linear interconnected system, which is composed of N input-constraints subsystems aṡ where denote the state vector, the internal dynamic, the input gain function and the constrained control input for the ith subsystem. Γ i j (x i , x j ) : ℝ n i × ℝ n j is the interconnected term between the ith and j th subsystems.
For non-linear interconnected system (1), the target of optimal tracking control is to design N optimal controllers (u * 1 , u * 2 , … , u * N ), ensure that the system states track the desired trajectoriesȳ i of every subsystem with finite time successfully, and the tracking error of every subsystem is defined by where the designed signalȳ i (t ) ∈ ℝ n i is bounded and Lipschitz continuous, and satisfẏȳ Taking the time derivative of (2) yields,

Problem transformation
In this section, a N -player differential game system is derived by defining augmented vectorsz where Remark 1. Combing with the definition of G i , ∀i ∈ and Assumption 1, we known that the augmented vector G i is bounded, and satisfy ‖G i ‖ ≤ G Mi , where G Mi , i ∈ are positive constants.
To realize the target of optimal tracking control, the finitehorizon value function of NZS differential game (4) is defined as where Q i = diag{Q i1 , 0 n 1 ×n 1 , … , Q in N , 0 n N ×n N } ∈ ℝ 2n×2n , Q i j = diag{q i1 , … , q in j } and 0 n j ×n j are the positive definite matrix and the zero matrix, respectively. Ψ i (Z (t f ), t f ), ∀i ∈ denote the terminal cost associated with the ith player, where t f is the fixed final time. As in [25], we use generalized non-quadratic functions to deal with the input constraint problems, where tanh(⋅) is a strictly monotonic odd function, and it satisfy | tanh(⋅)| < 1. Note that where are bounded by  V ti and  V Zi , respectively, and the terminal cost for the finite-horizon value function (5) is Therefore, the Hamiltonian function of system (4) is defined by According to the definition of Nash equilibrium [26], assume that the following conditions hold for all u then, the optimal value function V * i (Z, t ), ∀i ∈ is obtained Applying the stationarity conditions, the optimal tracking strategy and the semi-positive definite functional associated with each player of system (4) are calculated, respectively where Inserting equations (10) and (11) into the Hamiltonian function, the time-varying HJ equations can be derived (1) Under the finite-horizon scenario, the cost function (5) is the time-dependent solution for the coupled HJ equation (13), which is harder to solve than under the infinitehorizon case. (2) Different from existing literatures, a additional cost , ∀i ∈ is added to the ith subsystem. Therefore, the influence of the optimal control strategy for the neighboring subsystems is considered in the ith subsystem optimization, which ensures that finite-horizon cost function (5) is the optimal solution for both all subsystems and the entire system.
In generally, the optimal solution of the time-varying HJ equation for every subsystem is almost impossible to calculated since its a coupled partial differential equation. Therefore, the ADP technique is used to approximate the optimal solution through the use of critic NNs, then the distributed optimal tracking controller is derived approximately. However, the time-triggered controllers always inevitably waste computing resources and communication costs as in [14,15]. To settle this issues, the PET scheme is used to design the distributed optimal tracking control strategy for non-linear interconnected system (1).

DISTRIBUTED OPTIMAL TRACKING STRATEGIES VIA PET SCHEME
First, we design a PET scheme to reduce the frequency of information transmission and controller update while ensuring the tracking performance of error system (3). Then, event-based HJ equations and control strategies for N -player NZS differential game system (4) are derived, in which the Zeno behaviour is avoid since the time-triggered periodic is̄> 0. Further, a Theorem is proposed to ensure that the stability of the corresponding closed-loop NZS differential game.

Periodic event-triggered scheme
As shown in Figure 1, the sampling rate of system states for the PET control system is a constant̄> 0, and sampling signals Remark 3. As shown in Figure 1, the signals of tracking error and desired trajectory are sampled at the constant period̄in Sampler, then sampled signalsz i (l̄), ∀i ∈ are transmitted to Event Generator. When the triggering condition is violated, sampled signals are passed to the controller(i.e. the controller is updated). Obviously, triggering conditions only need to be verified periodically under the PET scheme and the minimum inter-event time is the constant period̄, thus the Zeno phenomenon is excluded.
Combing with the definition of the augmented vector Z (t ), state vectors for the NZS differential game system at triggering instants are defined asŽ r = [ž T 1r , … ,ž T Nr ] T ∈ R 2n . For the ith subsystem, the gap between the last triggered statež ir and the current sampled statez i ((t r + l )̄) is and the next triggered instant is denoted as where e T i is the trigger threshold.
Note that the triggering gap e ir (t ) is reset to zero when an event is triggered at t = t r . Meanwhile, the optimal tracking controller and the positive definition functional are updated, , R ij * jr where i . Then, the hybrid NZS differential game can be obtaineḋ Combing (13) and (15), the PET optimal tracking HJ equation is derived Assumption 2. The optimal tracking strategy of every subsystem is Lipschitz continuous with respect to e ri (t ), that is, where  ui and  ui are positive constants.
Remark 4. As in [16,18], the designed controllers under the mechanisms of the CTE and the ST may be occur Zeno behaviours (the occurrence of an infinite number of events in finite time), which will degrade system performance and destroy the system stability. Compared with the above schemes, triggering conditions only need to be verified periodically under the PET scheme as shown in Figure 1, which reduces the resource consumption during the detection process and avoids the Zeno phenomenon.

3.2
Distributed optimal tracking control strategy via PET scheme Theorem 1. Consider the augmented NZS differential game system (4). Let Assumptions 1-2 hold, and suppose V * i (Z, t ), i ∈ be a solution of the HJ equation associated with the ith player. Then, the UUB of the corresponding closed-loop system can be guaranteed under the PET optimal controller, if triggering conditions below are satisfied where Proof.
Choosing the Lyapunov function as L 0 ( is the solution of the HJ equation (19). Taking the time derivative of L 0 (Z, t ), we havė .
Then,L 0 (Z (t ), t ) can be rewritten aṡ Note that Therefore,L 0 (Z (t ), t ) becomeṡ It is worth pointing out that (22) yields (23), if the triggering condition (21) holdṡ Therefore,L 0 (Z (t ), t ) < 0 only if Z (t ) is out of the following compact set According to the Lyapunov extension theorem, the UUB of the tracking error trajectory of N-player differential game system (4) is proved under the PET optimal control strategy * ir and the triggering condition (21). □

DISTRIBUTED OPTIMAL TRACKING CONTROLLER DESIGN VIA APERIODIC CRITIC NNS
In this section, a group of approximate solution for the eventbased tracking error system is obtained by using adaptive dynamic programming technique, where critic NNs are used to reconstruct the optimal value function and the optimal control strategy.

The aperiodic critic NNs design
According to the universal approximation quality of NNs [27][28][29], the finite-horizon cost function and its terminal cost are approximated through the utilization of the following timedependent active function and ideal weights: where i ∈ ℝ L , ∈ ℝ L , ∀i ∈ are the ideal weights and the activation functions for critic NNs, respectively. L denotes the number of hidden-layer neurons and i (Z, t f − t ) represents the approximation errors. In addition, the terminal cost function for the ith subsystem is approximated by Note that the solution of the HJ equation for every subsystem can be approximated by using critic NNs, and the partial derivative of V i (Z, t ) can be calculated as where . Combining with (11) and (15), the optimal tracking control strategy under the time-triggered and the event-triggered scheme can be written as where Inserting (26) and (27) into (20), the Hamiltonian function becomes Due to the ideal weights for all subsystems are unknown,critic NNs are constructed to estimate the cost function, wherêi ∈ ℝ L is the estimate value of i , ∀i ∈ , andẐ (t f ) can be derived based on system dynamics and current states. Similarly, the partial derivative ofV i (Z, t ) with respect to Z and t are derived,V Using (11) and (12), the approximated optimal control strategy and positive definition function are obtained Under the PET scheme, (37)-(39) becomeŝ (40)-(42) and (31), the approximated Hamiltonian function can be obtained Define the error vector between the ideal weight and the estimate value as̃i = i −̂i, ∀i ∈ . Then, (44) becomes

the approximate error of Hamiltonian function is
According to (8) and (34), the terminal estimation error is To deal with the optimal game system, both the time-varying cost function and it's terminal constrains should be minimized along the system states. Inspired by [14,19,20], the following total squared error is defined The update law of weight vector̂i are given bẏ̂i wherēi, i ∈ is the learning rate of the ith subsystems. From (31) and (32), we get According tõi = i −̂i and considering (46), then the time derivative of̃i is obtaineḋ̃i

Stability analysis
The closed-loop system (18) and the estimation error̃i, ∀i ∈ are proved to be UUB according to the Lyapunov extension theorem. To facilitate analysis, several common assumptions are given, which also used in literatures [30][31][32].

Assumption 3. The ideal weight vector i , activation function
(Z, t f − t ), and approximation error i (Z, t f − t ) are assumed to be norm bounded, and their upper bound are Mi ,  and  i , respectively. The gradient of (Z, t f − t ) and i (Z, t f − t ) are also norm bounded, and satisfy Proof. Choosing the Lyapunov function as follows: where Case 1: When t ∈ [t r̄, t r+1̄) , that is, events are not triggered. It is clear that V * i (Ž r , t ) = 0, ∀i ∈ .
Case 2: Events are triggered, namely, t = t r+1̄. In terms of L(t ), the difference Lyapunov function is derived Since system states and optimal value functions are continuous, thus inequalities below can be obtained where i (⋅), i = 1, 2 are class-functions, and̂r +1 =Ž r+1 − Z r . This indicates thatL(t ) < 0 when events are triggered. By analyzing these two cases, we can derive that the UUB of system states and approximation errors of critic NNs can be ensured according to the Lyapunov extension theorem [33], as long as inequality conditions (48)-(50) hold. □

Application to the cooperative guidance system
In this subsection, a multi-missile cooperative guidance system borrowed from [34] is applied to verify the effectiveness of this paper proposed optimal tracking control algorithm. As shown in Figure 2, we consider the case of two missiles attacking the same target from different directions. Assume that the velocity vector between missiles and the target is constant during the engagement, then the relative kinematics equation can be expressed as below, and related some notations are displayed at Table 2.
Inspired by [34,35], a new time variable t goi = −r i ∕ṙ i is introduced to satisfy with the finite-horizon constraint condition, and new variables are defined as Therefore, dynamics equation (60) can be rewritten aṡ whereū i = t goi u i (|u i | < 100), i = 1, 2 andv i = t goi v. As in [34], dynamics equation (61) indicates that the conditions for the missile to successfully capture the target aircraft during the terminal guidance phase arėi → 0,̇r i < 0, i = 1, 2.
In the terminal guidance phase, initial value of parameters in (61)-(63) are chosen as (x T , y T ) = (5000, 0)m, V T =  With respect to the updating laws (47), the learning rete are are set as̄1 = 0.01 and̄2 = 0.02. As in [34,35], a probing noises is introduced in the first 15 seconds to excite the system.
Remark 5. The learning rate is an important hyperparameter in neural network weight training. To show the effect on the convergence speed of critic NN weights when changing the learning rate, several values of̄i, i = 1, 2 are adopted, and the relevant simulation results are given in the Table 3.

Simulated results and analysis
Simulated results are depicted in Figures 3-14, which show that the distributed control method has a good control effect with stabilization compared with the centralized control method. In the meantime, a mass of communication resources are saved by using the designed PET control scheme. In contrast to the single missile attack method used in [35], this paper proposed distributed guidance law realized the cooperative operation of several missiles, where different missiles use different guidance laws. As show in Figure 3, the target aircraft was successfully intercepted by two missiles fired from different locations simultaneously. Obviously, the proposed distributed optimal tracking control strategies improved missile penetration capability. Further, the relative distance between missiles and the target are trend to zero in 16 seconds, which satisfied with the finite-horizon constraints. In Figure 4, tracking errors converging to a small region around zero, and the control signals of two subsystems are described in Figure 5. Internal dynamics of the multi-missile interceptor system are depicted in Figures 6-11, and Figures 6 and 8 reveal that the LOS angular ratėi and the range ratėr i always satisfy capture conditions in (62), where the LOS angular ratėi are always in the neighborhood of zero and the range ratėr i are always less than zero. The convergence process of weight vectors of critic NNs are plotted in Figure 12, and sampling numbers of two subsystems under the proposed PET scheme are depicted in Figure 13, where the number of PET-based samples for subsystem1 and subsystem2 are 107 and 146, respectively. On the contrary, two subsystems both need to sample 800 times under the traditional periodic sampling (PS) scheme. This indicates that this paper presented sampling scheme saved 86.63% and 81.75% of the communication resources for two subsystems, respectively. Compared to the schemes of the CET and ST, the avoidance of the Zeno behaviour and the reduction of communication costs could be actualized at the same time under the PET scheme. As shown in Figure 13, the minimum sampling interval under the PET scheme is greater than or equal to the sampling interval under the PS scheme, that is,̄= 0.02s, which proves that the PET optimal tracking control strategy is able to exclude the Zeno behaviour.

CONCLUSION
In this paper, an aperiodic distributed optimal tracking control strategy has been developed for finite-horizon interconnected systems with constraint inputs. The large-scale interconnected system has been transmitted into a N-player NZS differential game. Then, the issues of distributed optimal tracking and finite-horizon constraints have been converted to deal with the time-varying HJ equations. In order to approximate the optimal solution effectively, critic NNs and the PET scheme have been employed. Based on the above scheme, system structures have been simplified and the consumption of communication resources has been saved, in the meantime, the Zeno phenomenon has been avoided. Finally, simulation results have shown that the update frequency of the distributed optimal tracking controller could be greatly saved and the target aircraft can be intercepted by missiles successfully. Next, we will extend the proposed control strategy to deal with the dynamic eventtriggered control problem of unknown non-linear systems.