Optimal agency contract for incentive and control under moral hazard in dynamic electric power networks

: The authors propose an optimal contract mechanism under moral hazard in discrete-time dynamic electric power networks. As the utility (system operator) cannot adjust the control input of the agents (end-users) directly in real time out of respect for individual decision–making, the agents’ control input maximising their own profit does not always maximise social welfare. To avoid the issue, the authors introduce an aggregator as intermediary between the utility and the agents. The aggregator pays compensation for defective ancillary services, which are caused by random disturbance and the agents’ voluntary control. To reduce the compensation risk, the authors first present an optimal incentive/control contract problem for the aggregator's compensation. The problem is usually regarded as a principal-agent problem under moral hazard in contract theory. However, it is generally difficult to solve a contract problem with dynamics expressed as discrete-time simultaneous Bellman equations and a hierarchical control structure as a Stackelberg game. The authors next show that the problem can be solved by regarding it as a linear-exponential-quadratic-Gaussian dynamic game and employing a numerical optimisation technique. Due to the ex-ante appropriate payment contract, the agents select control inputs preferable for the aggregator. The effectiveness of the proposed contract mechanism is finally demonstrated through simulation.


Introduction
In dynamic electric power systems transactive energy and control play increasingly important roles [1][2][3]. More specifically, it is necessary to reconstruct cost-effective infrastructures using local generators, renewables and demand response units, while reflecting both the economic and control purposes of the agents (end-users).
In the past 5 years, transaction systems have been realised in competitive electricity markets that maintain called ancillary services, such as suitable frequency, voltage and power, in real time (e.g. [3,4] and references therein). The authors in [5][6][7] highlight the state-of-the-art mechanisms and future challenges based on market-centric power control. Notable examples of the real-time electricity balancing market include the economically and physically integrated system in [2,8] and the dynamic mechanism design approach to incentivise agents via real-time pricing in [9][10][11]. In these papers [2,[8][9][10][11] both the agents and the utility (system operator) participating in the electricity markets operate in a decentralised manner under a gametheoretic framework, which is different from the conventional centralised control. The authors in [9][10][11], however, implicitly assume that all the agents always provide truthful reports and optimal controls. The incentive based on the mechanism design used in [9][10][11] guarantees the optimality of strategic agents' profits as long as they continue to report the true values of model information. Meanwhile, agents with mechanism design-based incentives substantially restrict their controls to the control set optimising social welfare that is the utility's objective. In other words, as the utility cannot adjust the agents' control input directly in real time out of respect for individual decision-making, the agents' control input maximising their own profit does not always maximise social welfare. This is called moral hazard in economics [12].
In this study we present a novel incentive and control mechanism producing high-quality ancillary services and social welfare maximisation under moral hazard. In the context of power engineering, there are many papers using mechanism design (e.g. [9][10][11] and references therein) or adverse selection (e.g. [13,14] and references therein), but to the best of our knowledge, there is no paper on the moral hazard problem except ours [7,15]. A formal problem description has been presented in a continuoustime setting [7] and some preliminary ideas to derive an implementable control law were discussed in [15]. To achieve our objective, we introduce an aggregator as intermediary between the utility and the agents, and investigate an appropriate economic incentive for the provision of private information from the agents. As the utility's objective is to achieve high-quality ancillary services, the aggregator is obligated to pay compensation for the quality of several services to the utility while receiving network model information and an initial commission fee. To reduce the compensation, it is desirable that the aggregator shares the economic volatility risk with the agents whose controls are one of the major causes of that volatility. Munoz et al. [16] point out that the impact of risk-averse action by agents is one of the most important research factors. Our challenge in this paper is to develop a novel contract mechanism for incentive and control between a risk-neutral aggregator and risk-averse agents on a dynamic electric power network.
The relationship between the aggregator and the agents can be regarded as a principal-agent (agency) problem under moral hazard in contract theory. In contract theory, a dynamic contract methodology for single-principal and single-agent problems with one product under moral hazard was considered in [17,18], and a static contract design for multiple agents was investigated in [12,19]. In particular, the multi-task case was considered in [19]. However, the problem formulation used for economics [12,[17][18][19] cannot be applied to an engineering problem with general cyberphysical control systems, as the control dimension is required to be the same as that of the state in the economic context. The authors in [13,14] present a contract theory approach in power systems, but they handle only the market-based energy transaction without considering dynamic supply-demand balancing and risk-sensitive controls. From the point of view of the control field, a dynamic contract mechanism using price incentives for a risk-sensitive principal in the presence of multiple agents with dynamics, which does not consider the case of multiple resources, can be found in [20]. Regarding risk-limiting control in the context of power systems, a multi-stage stochastic decision problem for consumption, ignoring the hierarchical structure of a system operator and consumers, was considered in [21].
In this paper, we first formulate a novel optimal contract problem for incentive and control between multiple risk-averse agents and a risk-neutral aggregator to improve the performance of multiple ancillary services in a standard discrete-time dynamic electric power system, called the average system frequency model, which was introduced in [22]. It is generally difficult to solve a contract problem with dynamics expressed as discrete-time simultaneous Bellman equations. However, thanks to our specific problem formulation, we can show that the optimisation problem can be solved by regarding it as a linear-exponential-quadratic-Gaussian (LEQG) dynamic game [23][24][25] and employing a standard global optimisation algorithm (note that the authors in [23][24][25] do not handle hierarchical incentive problems, such as our proposed problem). As a result, the optimal contract can incentivise the economically rational agents to take the proper control action desired by the aggregator; otherwise the agents will incur very large monetary loss. The effectiveness of the proposed contract mechanism is finally demonstrated through simulation.
The contributions of this paper can be summarised as follows: (i) a novel contract problem formulation under moral hazard to guarantee ancillary services on dynamic electric power networks is presented (Sections 2.2 and 3); (ii) a solution technique for finding optimal incentive and implementable control actions based on a integrating LEQG dynamic game and numerical optimisation solvers is proposed (Section 4); and (iii) the proposed approach is demonstrated in simulations with a standard power network model (Section 5), whereas such verification has not always been made in the existing works on principal-agents problems.

Dynamic electric power network model
This paper considers real-time frequency regulation on M-area networked dynamic power systems, where the power flow on the network is managed by the utility (e.g. system operator), each area j ∈ ℳ := {1, …, M} has a set of regulatable electricity producers G j , such as fast-response steam turbines or hydraulic turbines, and a set of consumers D j providing a demand response (Fig. 1). Each area j is assumed to be connected with some of the other areas ℳ j ⊆ ℳ∖{ j}. Then, from [9,10,22], we introduce a generic model for load-frequency control problems, called an average system frequency model, as shown below.
In this paper, we are concerned with the deviation of frequency and power flows at primary and secondary control levels under the assumption that the energy dispatch and set-point scheduling during the future short-time intervals t = 0, 1, …, T have been determined at the tertiary control level. See [7] for more details.
Let ω t j , P t j and ζ t j be the deviation of frequency in area j, the net imbalance power in area j and the net tie-line power injected from the network to area j at time t, respectively. Then, the power flow dynamics for each area j ∈ ℳ are described by where H j is the net inertial constant of the generators in area j, Y jk is the inverse of the line inductance between j and k ∈ ℳ j , and w t j is the disturbance, including the stochastic nature of renewables and demand, in area j. The variables q t g and q t d denote the active power generated by g ∈ G j and the active power consumed by d ∈ D j at time t, respectively. These are given by and C d are system parameters and matrices determined by mathematical modelling of individual components of power systems (see e.g. [22]). The amount of electricity q t g generated by each producer g ∈ G j obeys its individual dynamics (2a) with its local states ξ t g including q t g and the set-point control signal u t g . The amount q t d consumed by each Kirchhoff's law on tie-lines, ζ t M can be rewritten as a function of To guarantee the invertibility in Lemma 1, in practice, we remove ζ t M from x t . The set N indexes the agents, collecting together all the producers and the consumers with individual controllers; N denotes the number of elements in N. The process {w t } is i.i.d. Gaussian noise with mean zero and covariance W t = (W t ) ⊤ > 0, where the notation ⋅ T denotes the transposition of a matrix or a vector. Then, we can obtain the dynamic power network system as a discrete-time linearised time-varying model along the scheduled trajectory: for t = 0, …, T − 1. The matrices A t , B t i and D t are determined from the linear dynamical systems (1a)-(1c) and (2a)-(2b) as x t is the collection of all the network states ω t j , ζ t j and the local states ξ t g , Let the collection of states along (3) with the initial state x at time t = 0 and an admissible control profile u := (u i ) i ∈ N , With the notation u −i := (u j ) j ∈ N∖{i} denoting the profile of the control input of all the other agents except i, we will sometimes write u as (u i , u −i ). In this paper, we assume that an admissible control input u t i at time t depends on only the state x t at time t.

Quality of services and real-time control
We next focus on the ex post compensation for defective ancillary services. There are numerous options to define ancillary services concretely and evaluate them quantitatively. Here, let us denote the set of L ancillary services by ℒ := {1, …, L} and the quality of each service l ∈ ℒ by y l . From the point of view of power system control [2,8], it is highly desirable to reduce the frequency deviations ω T j of each area j ∈ ℳ at the terminal time T (normally T = 30 s or 5 min). In this paper, we assume that ω T j is taken as the jth ancillary service at time T. Then, we can define y j := ω T j to be the quality of that service and obtain the collection of their quality In [22], the deviations for the ancillary services ℒ are evaluated as the penalty functions at the terminal time. This paper follows the approach in [22] and the penalty function of each service l ∈ ℒ is given by π l (y l ) based on its quality y l . For instance, the penalty function π l (y l ) can be regarded as the blackout risk over the whole network caused by the service l. Although there might be a variety of options as the function π l (y l ), this paper introduces a quadraticform penalty function, i.e. π l (y l ) = α l (y l ) 2 , l ∈ ℒ. The penalty coefficient α l is a positive coefficient that is defined by the utility. Therefore, given the initial state x at time t = 0, the utility aims to influence the agents' control inputs u reducing the expected compensation risk, which is formulated as where E x indicates the expectation operator under an initial state x at time t = 0. Meanwhile, each agent i ∈ N basically behaves according to their own control input lowering their own system cost during the period: are control design parameters of agent i. As the dynamics of state deviations depend on all the agents' control inputs u, the agents are in a competitive relationship, which is just a dynamic game among the agents. Then, in general, the collection of the voluntary control inputs based on (5) is very different from that minimising (4).
To realise the optimisation of (4) systematically, the authors in [9,10] have presented a balancing market enabling a distributed implementation via real-time pricing, as shown in Fig. 2. Once an appropriate pricing mechanism p t i (x) based on all the true model information has been determined, the utility with the balancing market can suitably manage the network in a distributed fashion by using only the on-line information x t . However, the above system relies on the fundamental assumption that the agents truly report to the utility of their private model information: the system model (2) and the cost function (5). A truth-telling mechanism based on the mechanism design was also presented in [9,10]. Throughout this paper, we assume that all the participants tell the truth. Even if the truth-telling assumption is satisfied, it is necessary to prepare an economic incentive as a consideration for the provision of private information. This is called an incentive compatibility constraint [17,19].
Here let us introduce an aggregator as an intermediary between the utility and the agents. In this paper, the aggregator is defined as a commercial organisation with the following two functions: one is internally to bundle the multiple agents whose interest conflict with one another while acting as a representative of the agents externally. The other is to negotiate a pricing rule with the utility that is a system operator, while protecting the agents' private information, e.g. system parameters and cost functionals, against the utility. The promising contract and control architecture is summarised in Fig. 3. The utility signs an outsourcing contract for real-time power network stabilisation with the aggregator. The aggregator gets an initial commission fee s 0 (x) depending on the initial state x at time t = 0 from the utility, and provides the real-time pricing mechanism p t i (x) for the utility. As the objective of the utility, as the system operator, is to achieve high-quality ancillary services, the aggregator is obligated to pay the compensation ∑ l ∈ ℒ π l (y l ) for the quality of service provided to the utility while receiving model information about the dynamic electric power network (1a-1c). To reduce the compensation caused by state deviations and ensure his own net profit, the aggregator hopes to share the volatility risk with the agents, whose controllers are one of the major causes of that volatility. Therefore, the aggregator signs a contract with each agent i ∈ N on condition that the agent obtains larger economic gain based on an economic incentive s i (y) than the current satisfaction level. As a result, the aggregator obtains the private information of the agents given by (2a)-(2b) and (5), while each agent i is obligated to give the utility the on-line information ξ t i to stabilise the power network.
We are convinced that the proposed decision procedure has the following side effect. If the utility has the functions of the aggregator, there is a possibility that the utility adopts an unfaithful pricing mechanism but all the agents cannot reveal the utility's iniquity. Meanwhile, in our settings, as both the aggregator and the utility, which are the interested parties, can monitor the behaviour of each other, they tend to take their optimal behaviour respectively. In the next section, we present the desired contract mechanism s i (y) between the aggregator and the agents.

Optimal contract mechanism in agency relationships
To derive an appropriate economic contract s i (y), we apply the contract theory of principal-agent problems to the above problem. In the context of principal-agent problems, the aggregator, the controllable producers and consumers N, and the ancillary services ℒ are regarded as principal, agents and tasks, respectively [12,19]. From [15,17,19], to share the compensation with an agent i ∈ N, we assume that s i (y) is composed of the weighted sum of the compensation for each task and a constant incentive, that is where β i := (β io , (β il ) l ∈ ℒ i ), β il ≥ 0, and ℒ i ⊆ ℒ is the set of tasks for which agent i bears responsibility. The β il indicates the risksharing level of each task l ∈ ℒ i , and β io represents an economic support for incentivising the agents to participate in the contract.
We see from π l (y l ) = α l (y l ) 2 = α l (ω T l ) 2 and (6) that the matrix Q T i is a symmetric matrix depending on (β il ) l ∈ ℒ i . The reason why linear contract models (6) are used is discussed in [17]. In Section 5, we consider the case where ℒ is divided among the agents, that is ℒ i ≠ ℒ. See the last case of Section 5 for more details.
The main objective of this paper is for the aggregator to determine appropriate contract parameters β := (β i ) i ∈ N before executing the real-time procedure, as shown in Fig. 2. In the above situation, the cost functional J 0 and the ex post payment z 0 of the risk-neutral aggregator are given by and Q u is a suitable matrix derived from the ancillary services (4). The aggregator aims at minimising J 0 (z 0 (X(x, u); β)) and the decision parameters of the aggregator are not only β but also the uncontrollable u. Even if the aggregator knows all the private model information, (1a)-(1c), (2a)-(2b) (5) and (8a) shown below, in advance, the aggregator cannot see the control input u selected by the agents until the realtime control system shown in Fig. 2 is executed. Hence, the control input u is regarded as the agents' hidden action and the problem is a so-called moral hazard problem [12,19]. The aggregator actually has to estimate the compensation from the acquired model.
In addition, from [16], the agents are normally risk-averse to uncontrollable environmental disturbances. Following the approach used for standard contract problems, given in [19], we use a cost functional J i (z i ) described by an exponential function. Then, the total outgoings z i of the risk-averse agent i during the period are composed of the tariff required from the aggregator and their own system cost (5), that is where r i > 0 is a risk-aversion coefficient. We obviously see from (8b) and the definition of Q t ii that Q t i = (Q t i ) T ≥ 0, t = 0, …, T. Each agent i adopts the input strategy u i to minimise In summary, given the model information (1a)-(1c), (2a)-(2b), (5), and (8a), the economic contract (4), (6), and the state x at time t = 0, the aggregator solves the following optimal contract problem with dynamics (3): where z¯i(x) represents a maximum limit for agent i's payment. Conditions (9b) and (9c) represent individual rationality constraints and incentive compatibility constraints, respectively [17,19]. Constraint (9b) means that the contract parameter β leads the agents into adopting a control input that minimises their own payments. Constraint (9c) guarantees participation. Under the constraints, the aggregator determines the contract parameter β and the corresponding control action u that minimises his own net payment. Note that the description in (9a) follows the standard economic literature [12,[17][18][19]. In the next section, we will find the optimal parameter β* and the corresponding control input u*, which are the solution of (9a)-(9c).

Contract conditions
We first consider the agents' individual rationality constraints (9b). For a contract parameter β, if there is a Nash equilibrium then u* is an optimal control for β. From the problem formulation, problem (9b) with (3) and (8a)-(8b) can be equated with a discretetime LEQG dynamic game in the systems and control field [23][24][25]. The solution to this problem is a so-called risk-sensitive feedback Nash equilibrium [25]. Therefore, we can use Lemma 1 to determine the agents' optimal controls u* for β analytically. Lemma 1: Let us denote by Γ the set of β satisfying the condition that there exist matrices M t + 1 i , Π t i , t = 0, …, T − 1, solving the following recursion formulae: subject to Then, given β ∈ Γ and a state x at time t = 0, the solution of the optimisation problem (9b), that is the optimal control policy u* for β, is uniquely given by for t = 0, …, T − 1, i ∈ N, and the value of agent i's expected reward functional J i * is where G 0 i is obtained through the recursion formulae As the contract parameter β is fixed, the lemma is proved by integrating the solutions of a 2-agent LEQG dynamic game [25,Corollary 1] and an N-agent LQG dynamic game [26, Corollary 6.1] into that of LEQG control problem [23,Lemma]. From (14) and the individual rationality constraints (9c), we need to choose a β io satisfying Condition (12) is necessary to execute the recursive computation for implementing a Nash equilibrium [18,23]. Hence, when the implementable condition (12) does not hold for all β, we see that the pair of the corresponding system models and the corresponding cost functions has no contract condition for control and incentive. Remark 1: We see from (16) that the existence of the set Γ is also a necessary condition for β to satisfy the individual rationality constraint. If the size of the noise covariance matrix W t and the risk-aversion coefficient r i are relatively small or the size of the performance matrix R t i is relatively large, then it is highly likely that the implementability condition (12) holds for any β. We emphasise that M t + 1 i − Π t + 1 i in (11a) has a high affinity with the effect of the so-called risk premium [19].
Remark 2: The parameter θ i := − r i < 0 indicates a risk-loving case and we use J i (z i ) = E x −exp(θ i z i ) instead of (8a) in this case. Then, Lemma 1 holds without the implementability condition (12). If each agent has a risk-neutral reward function described by J i (z i ) = E x z i , (9b) is regarded as an LQG dynamic game problem. The solution of the LQG game is a risk-neutral Nash equilibrium [26], which is given by (11a)-(11c) with r i = 0.
Remark 3: The function p t i (x) in (13) can be interpreted as realtime pricing, as shown in Fig. 2 and Section 2.2. Then, under the truth-telling assumption, each agent does not need to obtain the others' model information.
We next consider the aggregator's optimisation problem (9a). As is shown in [17], we choose the β io * which satisfies (16) as an equality, that is in order to minimise J 0 (z 0 ). Once (13) and (17) have been obtained, the aggregator's cost functional J 0 (z 0 ) can be expressed as a function of β. Actually, the function J 0 (z 0 ) can be rewritten as J 0 (z 0 (X(x, u*(β)); β)) where F T T − 1 is an identity matrix with an appropriate size, Proposition 1: Given a state x at time t = 0, the dynamic contract problem (9a)-(9c) is equivalent to In order not to require the unimodality of β in (19a), we derive an optimal contract parameter β* ∈ Γ numerically through a heuristically global optimisation algorithm [27]. As long as each agent is economically rational, the β* obtained using (19a)-(19c) incentivises the agents to take a proper action desired by the aggregator and J i * = J i (z¯i(x)); otherwise the agents incur a monetary loss.
We see from the above results that the expected compensation risk of the utility can be written as In the case without service-dependent contract factors (i.e. , the resulting control policy u = u o is regarded as the conventional Nash equilibrium without the aggregator because Q T i ≡ 0, i ∈ N. As the corresponding initial incentive β io =: β io o given by (19c) and the quality of service

Simulation
We finally verify the effectiveness of the proposed risk-sharing mechanism through a simulation. Similarly to [8], let us consider a two-area power network, where each area includes one producer and one consumer (Fig. 4). We set 0.2 s as the sampling period of the discrete-time dynamic power system in (3) and T = 150 (i.e. 30 s) as the terminal time. The dynamic power network model basically follows that of [8], and the approximate generation dynamics and the load dynamics are first-order systems with time constants 0.33 and 3, respectively. Then, ξ t i = q t i for all i ∈ N := {1, 2, 3, 4} and x t = (ω t 1 , ζ t 1 , q t g1 , q t d1 , ω t 2 , q t g2 , q t d2 ) ⊤ (p.u.). We thus use the following values: We now focus on two services ℒ = {1, 2}. The quality of the services is given by (y 1 , y 2 ) = (ω 1 , ω 2 ) and the task collection of each agent i ∈ N is set as ℒ i = ℒ, which means all the agents N bear the risks caused by all the tasks ℒ. For simplicity, we set the satisfaction-level z¯i(x) = 0 and the service performance weight α l = 1, for all l ∈ ℒ. The (1, 1)-element and (5, 5)-element of the matrix Q u take the value 1; other elements are 0. Under the above setting, we run the proposed contract mechanism to seek an optimal contract parameter β* for the initial state x 0 = (0.05, 0, 0.05, 0.05, 0.05, 0.05, 0.05) T by using particle swarm optimisation [27]. To implement the optimisation algorithm, we first seek β* ∈ Γ from 0 ≤ β il ≤ 10, i ∈ N, l ∈ ℒ and the search area approaches the tentative β*. We repeat the procedure and find the desired optimal β*. The optimal contract condition in the case of r i = 100 for all i ∈ N is shown in Table 1 and Fig. 5 shows the time evolution of the average of the states x t , the performance weight ∥ Π t i ∥ 2 and the risk premium ∥ M t + 1 i − Π t i ∥ 2 for 100 noise patterns. We see from Table 1 that the optimal β* has symmetry, for example β 11 * = β 32 * and β 2o * = β 4o *, due to the symmetric physical and economical structure of the network. The penalty rate and the initial incentive for consumers, which have the slower evolution, are larger than those of producers, and the penalty rate for the service in their own area is lower than that of the opposite area. We also see from J 0 * < 0 and J u (u*) < J u o that the proposed mechanism gives the aggregator an economic benefit and improves the quality of service while the agents satisfy the participation condition. From the left panel of Fig. 5, we can see that all the states, including the tasks ω 1  and ω 2 converge at the scheduled set-points. From the centre and the right panels of Fig. 5, we can see that the magnitude of the cost for state deviation and the risk premium increase while approaching the terminal time. We see from these results that the proposed risk-sharing mechanism can integrate the voluntary agents along the Nash equilibrium by the trade-off between private model information and economic incentive. The time evolution and the optimal contract condition in the case of r i = 0.01 for all i ∈ N are shown in Fig. 6 and Table 2, respectively. We can see that the results for r i = 0.01 have the same tendency as those for r i = 100. As the small risk-aversion reduces the uncertain factors and the major cause of non-optimality tends to be the agents' voluntary control actions, the expected economic benefit of the aggregator and the quality of services for r i = 0.01 are superior to those for r i = 100, and the β* values are larger. The centre and right panels of Fig. 6 also have the same interpretation. When the covariance of the noise disturbance W is small, the resulting behaviour is similar to the small risk-aversion case.
We next consider area-dependent risk-aversion coefficients: r 1 = r 2 = 100 and r 3 = r 4 = 0.01. The results are shown in Table 3. We see that β* values in Table 3 are basically intermediate values  lying between those of Tables 1 and 2. To stabilise the frequency ω 1 of area 1 with the larger risk-aversion coefficient, the corresponding contract factor β i1 * increases.
Finally, we consider area-oriented risk-sharing of tasks under the same risk-aversion coefficient r i = 100, i ∈ N: that is ℒ 1 = ℒ 2 = {1} and ℒ 3 = ℒ 4 = {2} instead of ℒ i = ℒ = {1, 2}. The results are shown in Table 4. Since the agents in area 1 (2) are not given an incentive to stabilise the frequency in the opposite area 2 (1), β il values for l ∈ ℒ i in Table 4 are larger than those in Table 1.

Conclusion
This paper has investigated an optimal contract mechanism between an aggregator and risk-averse agents under moral hazard on dynamic electric power systems. We first formulated a novel contract problem to reduce the aggregator's economic risk. Then, we showed that the optimisation problem is a principal-agent problem with dynamic control systems and can be solved by considering the corresponding LEQG dynamic game and employing a numerical optimisation technique. The effectiveness of the economic risk-sharing between the aggregator and the agents induced by the proposed mechanism was also demonstrated through simulation. The results show that the optimal contract incentivises the agents to take the proper control action desired by