Synthetic residential load models for smart city energy management simulations

The ability to control tens of thousands of residential electricity customers in a coordinated manner has the potential to enact system-wide electric load changes, such as reduce congestion and peak demand, among other benefits. To quantify the potential benefits of demand-side management and other power system simulation studies (e.g. home energy management, large-scale residential demand response), synthetic load datasets that accurately characterise the system load are required. This study designs a combined top-down and bottom-up approach for modelling individual residential customers and their individual electric assets, each possessing their own characteristics, using time-varying queueing models. The aggregation of all customer loads created by the queueing models represents a known city-sized load curve to be used in simulation studies. The three presented residential queueing load models use only publicly available data. An open-source Python tool to allow researchers to generate residential load data for their studies is also provided. The simulation results presented consider the ComEd region (utility company from Chicago, IL) and demonstrate the characteristics of the three proposed residential queueing load models, the impact of the choice of model parameters, and scalability performance of the Python tool.


Nomenclature
T probability distribution of inter-arrival times X probability distribution of service times C number of servers (represents the maximum load power that can be served) K queue capacity (maximum number of elements in the queue) Z serving policy (order in which the queue is served) P number of all the elements that can arrive in the queue T / X /C simplified Kendall notation (assumes K = ∞, Z is first come first served, and P = ∞) M t /G/∞ queueing load model with a time-dependent Poisson process, general probability distribution of service times, and infinite capacity t time λ(t) time-varying appliance rate of arrival in the queue D duration of a set of customer appliances P power rating of a set of customer appliances l(t) expected aggregated household load C L (t) openly available hourly load data from any distribution company b min minimum expected residential load for a given time period b max maximum expected residential load for a given time period T simulation time lower bound T simulation time upper bound ψ set of customer appliances E P expected power of appliances in the set E D expected duration of appliances in the set Arrival list of appliances with arrival time and other userdefined appliance attributes Δt i inter-arrival time between appliances i index of the arrival of an appliance in Arrival list app specific appliance M t /G/C queueing load model with a time-dependent Poisson process, general probability distribution of service times, and finite capacity k C user-defined gain

Introduction
Residential loads represent ∼38% of the total energy consumption in the US [1]. Residential demand response (DR) can provide significant benefits in the electricity market: (a) participant financial benefits; (b) market-wide financial benefits; (c) reliability benefits; and (d) market performance benefits [2,3]. Residential DR makes system-wide changes that require tens of thousands of buildings, each with many individual electric energy devices, to be controlled [4]. To evaluate the impact of residential DR strategies on electric power system operation and markets requires largescale residential load data for use in simulation studies. The input parameters to such simulations should include the unique characteristics of each individual residential customer, along with their individual electric energy assets. The aggregate of all such customer load data should behave like a typical city or region. Typically, large-scale customer residential data is either unavailable or proprietary due to privacy concerns [5][6][7]. For example, a load model that makes use of a large proprietary database that includes measurements of appliances and household loads are presented in [5]. In a second study, the interaction of DR and unit commitment of a microgrid is described [6]. The controllable smart loads are modelled with a neural network that uses measured and simulated data from an actual energy hub management system for supervised training. In the case of [7], the loading of a distribution transformer is used to generate load curves. Because the data in these studies is not publicly available, it is not possible to replicate the simulation results, compare new DR and other demand-side management methods to others, or generalise the results to other customers/regions. Smart meter data has also been utilised for modelling individual home appliances in [8], where a hidden Markov model with differential observations is employed to attempt to identify individual appliance usage from the customer load.
Other techniques used are the static customer behaviour method [9][10][11] (i.e. load characteristics are acquired from static parameters); statistical methods [4,12,13] (i.e. commonly requiring customer surveys which may not be available for different regions, customers, or times); and physical methods [14]. The table method used in [9,10] possesses a set of schedulable appliances that is static every day for every customer. Thus, all the homes are assumed to have the same occupancy habits, not representing changes in behaviour through time. In [11], the static customer behaviour method does not contain schedulable appliances, but rather a reference load with upper and lower limits. With the flexibility knowledge of market participants, the convergence of customer interaction on the system is analysed with game theory. In [4,12], aggregated residential DR is performed by scheduling appliances, and [12] considers heating, ventilation, and air conditioning (HVAC). A probabilistic method is proposed in [4], where 18 schedulable appliances are used to statistically generate residential loads to reflect the total energy in an average household. The appliances have a specific percentage penetration, power rating, and start time with mean and standard deviation, however the aggregate load of all customers does not change each day nor consider regional variations. The appliances in [12] are modelled according to [5], with the same limitations for such studies.
A Markov Chain Monte Carlo approach is developed in [13], having the chain represents the state of the resident (e.g. presence of inhabitants), which affects energy consumption. The statistical model is fit using Netherlands public data surveys. The physical method proposed in [14] develops a load simulator for residential customers considering the physical characteristics of a home. The model considers some home configurations, HVAC, and characteristics of other loads (e.g. washer, dishwasher, water heater). The method is intended to model an individual home, and is not suitable to be scaled for a city-sized study.
The literature that presents residential customer load models can be roughly divided into two categories -top-down and bottom-up [13] -which may or may not make use of proprietary data. Topdown models use aggregated load to generate individual load curves [5][6][7]. Solanki et al. [6] know the aggregated load and smart appliances, while Colson and Nehrir [7] only know the aggregated distribution transformer. The bottom-up approach uses the given characteristics of appliances and the statistical behaviour of customers to generate the load profile [4,[9][10][11][12][13][14]. Table 1 summarises the methods in the literature and demonstrates the need for the proposed synthetic residential load models. The table is divided into the required inputs for the literature methods, and the generated outputs. The numerous load models present in the literature (i) are dependent on data that is not publicly available nor applicable to all regions of study, (ii) do not aggregate to the system load curve, or (iii) maintain the same daily customer behaviour throughout the simulation. There is a need for residential load data that is openly available (i.e. results can be replicated and compared), aggregates to a known system curve (i.e. large-scale studies that represent the expected electric energy behaviour of a city), varies through time (e.g. hourly, daily, and seasonal variation), and does not require extensive customer surveys as they may not be available for every region.
This paper addresses the needs of residential large-scale load data by using flexible time-varying queueing models to generate synthetic residential load data to allow simulation studies to be replicated and compared by the research community to new stateof-the-art methods. The proposed top-down and bottom-up approach addresses the challenge of unavailable and proprietary customer data by utilising available aggregate load data for a region as an input to generate individual load profiles comprised of individual residential electric assets. The aggregate of the individual synthetic customer load data generated by the queueing models accurately represents a known system load curve. It contains the time-varying characteristics of an actual power system region.
In this paper, we expand on the M t /G/∞ queueing load model from [15], which incorporates the arrival of appliances that comprise aggregate individual residential customer load with timevarying behaviour. In that work, however, the physical limitations of homes were not considered, the appliance model was limited to active power consumption only, and scalability for city-size synthetic load datasets is not addressed. Thus, the primary contributions in this work are • Design of two new synthetic time-varying residential queueing load models that are capable of incorporating the physical limitations of residential customers without loss of generality; • Creation of a general residential appliance model that possesses many attributes (e.g. power, duration, schedulability, ZIP load parameters), and is extensible according to the necessities of the specific study being performed; and • Development of a scalable Python tool that generates the synthetic residential load models in parallel using highperformance computing.
The remainder of this paper is organised as follows: Section 2 presents the overview of queueing theory and the residential queueing load models. The appliance model and its variations to incorporate voltage dependencies, scheduling windows, and the consideration of non-arriving loads (e.g. HVAC) are presented in Section 3. Section 4 presents the necessary inputs for the proposed synthetic residential load models. The behaviour of the proposed models is presented and validated in Section 5. Concluding remarks on the models are discussed in Section 6.

Overview
Queueing theory models the behaviour of a queue, i.e. waiting in line, and was initially employed in the communication field (e.g. phone operators) to evaluate the performance of the system and determine how to operate the system more efficiently. Queueing theory can model applications in many disciplines and provides  [4,5,[9][10][11][12][13][14] distinct day customer load variability (does not aggregate to system load curve) [5-7, 12, 13] individual customer measurement [5,8] low load variability (might or might not aggregate to system load curve) [9][10][11]14] aggregation of small regions [6,7]  useful insight into distinct systems. In [16], an emergency hospital modelled the random arrival of admitted patients as a queue and the condition of patients as the priority. With the model, the flow of patients can be analysed (e.g. queue waiting time). The authors in [17] make use of queueing models for the load profile of plugin electric vehicles at charging stations, similar to the application in this study. Queueing models are defined by the probability distribution of inter-arrival times, probability distribution of service times, number of servers, queue capacity, size of the population, and a service discipline. Furthermore, the characteristics can be constant or timedependent (e.g. inter-arrival times as a function of time). Fig. 1 presents an overview of the behaviour of a queue, having the probability distribution of inter-arrival times T. An element arrives in the queue at a time and possesses its own characteristics (e.g. priority of arriving jobs in a server). The probability distribution of service times is defined by X, i.e. the distribution of time to serve an arriving element. The number of servers, C, is a physical constraint of the system (e.g. the maximum capacity of the servers to serve the arriving jobs). The characteristics of the queue are its capacity, K, and serving policy, Z (i.e. the maximum number of elements in the queue and the order in which they will be served). The size of the population P, is the number of all the elements that can arrive in the queue.
For all the queueing models presented in this paper, the following three assumptions have been made: the queue length is infinite (i.e. K = ∞, no loss of appliances arriving at the system); the population is infinite (i.e. P = ∞, arrival process is not dependent on the appliances currently present in the system); and the service policy is first come first served. Given the queueing model assumptions, the simplified Kendall notation is used to describe the queue behaviour, i.e. T /X /C. The main advantages of using queueing theory for generating synthetic residential loads are its relation to load: • Residential customers use their appliances according to their individual behavior, thus from the view of an outside observer, the start times of appliances are random. Customers are distinct and assumed to be unable to influence the usage of other customers. Nevertheless, the aggregation of a considerable number of customers is known (i.e. top-down); • Every arriving appliance possesses its different characteristics as generic elements, e.g. electric, temporal, and schedulability; and • Non-homogeneous Poisson process has an average rate of arrivals that varies through time. Thus, the time-varying characteristics of hourly, daily, weekly, and seasonal behaviour of load are naturally considered.
The queueing studies presented in this paper are specifically interested in the probability distribution of inter-arrival times and usage of servers as it applies to the residential electric load. Fig. 2 illustrates the overall behaviour of the queueing load models proposed in this paper for a single home. The arriving elements are appliances with a time-varying exponential distribution. Thus, during peak load, there is a more significant probability of small inter-arrival times (i.e. more electric energy arriving into the system) and the opposite for valleys. The output of the queueing process in this work is the utilisation of 'servers' which corresponds to the active power consumption of a residential home. When representing electric load, the serving capacity need not be represented by an integer (e.g. an arrival process could utilise 50.239 servers at a given time as active power is in the real number set). Sections 2.2-2.4 present the three proposed queueing load models. Fig. 3 presents the overall procedure the load models follow. The load generation for each customer is independent; thus the overall queueing procedure is the same for all customers. Each model possesses unique characteristics, but the overall procedure is maintained for all.

M t /G/∞ queueing load model
The M t /G/∞ queueing load model presented here builds on the previous work in [15]. The queueing load model represents interarrival times as a time-dependent Poisson process (T := M t ), the probability distribution of service times is general (X := G), and the power capacity in the home is infinite (C := ∞). The arrival of appliances in the M t /G/∞ queueing model is time-dependent to capture the temporal behaviour of customers. Furthermore, because there is an infinite capacity, the arriving appliances are served as soon as they arrive in the queue. At the time t, let λ(t) be the time-varying appliance rate into the system, D and P be the random variables describing the duration and power rating of the set of customer appliances, respectively, and l(t) be the expected aggregated household load. The timevarying appliance rate into the system with a Poisson process is described as The mathematical derivation of (1) is presented in [15]. The derivation makes use of the linear-with-time-shift (LIN-S) approximation from [18] and assumes no causality as the data is to be used in simulation studies. The expected home load l(t) is generated with the openly available hourly load data C L (t) from any distribution company.
C L (t) naturally describes the aggregated behaviour of customers in a given region containing its geographic characteristics, e.g. climate and customer preferences. The load data is scaled to generate the expected individual home load (2), where b min and b max are the minimum and maximum expected residential loads for a given time period A flowchart for generating the synthetic residential load model M t /G/∞ is presented in Fig. 4. The user-defined input parameters are the load scaling factors b min and b max , and the range of the simulation time from T to T. The data input is the aggregate load curve C L (t) and the set of appliances ψ, which will be discussed in detail in Section 3.1. The expected power E P and expected duration E D used in (1) are computed for the customer set of appliances ψ. The process starts by initialising the variables where Arrival is a list of appliances with arrival time and other userdefined appliance attributes (e.g. schedulability, ZIP load parameters). This general appliance model allows the researcher to extend the output to match the study of interest (e.g. home energy management). The variable Δt i is the inter-arrival time between appliances (i.e. time for next arrival).
The M t /G/∞ queueing model serves the appliances as soon as they arrive in the system. Making use of (1), the generated customer load when aggregated approximates the known load of the distribution company, C L (t). The number of servers being infinite is justifiable because as soon as an appliance is turned ON, it instantaneously starts operating, i.e. no waiting in line to consume power. Due to the intrinsic random behaviour of the queueing model, however, the generated load peaks could surpass an individual residential building peak load consumption.
The unrealistic load peaks are a result of multiple appliances arriving in a short amount of time, commonly referred to as burstiness [19]. For the same arrival rate λ(t), different burstiness levels can occur. As the elements that form the distribution test case and residential building have physical limitations, to be able to utilise the load generated by the queueing model in distribution system test cases, the residential peak load may need a limit. To address this characteristic, the assumption of an infinite number of servers can be changed, which will be discussed at the two new proposed queueing load models in the following sections.

M t /G/C queueing load model
To address the issue with unrealistic peak load from the M t /G/∞ model, the serving capacity of the queue can be limited. In the M t /G/C queueing model, the system is unable to serve an infinite amount of arriving appliances. In this model, when an appliance arrives in the queue, it may no longer be immediately served, but it will instead depend on the available capacity. This addresses the issue of unfeasible peak load consumption of a residential home, given the physical limitations of the system. In the M t /G/C queueing load model, the power capacity/maximum load consumption that can be served must be defined based on the physical limitations of the power system. One such method for setting the limit while maintaining the independent nature of each customer queue is by defining a gain based on the customer's expected home load. For a given customer, let C be the residential home capacity for the queue model, and k C be the user-defined gain. We define, for a given customer, the residential home capacity as The power capacity C can be defined from (3) or be explicitly chosen by the user. In this work, we assumed all customers have the same C, k C , and scaled l(t) according to (2) and (3), but this can be scaled based on home sizes of the particular study of interest with no loss of generality.
With the limitation of the residential home capacity in the M t /G/C queueing load model, unfeasible peak load consumption, given the physical limitations of a system is no longer created. However, in the M t /G/C queue, there may still be unrealistic peaks for low values of l(t). Thus, even though the load does not surpass the physical limitations of the system, unrealistic load peaks based on customer behaviour may still be generated by the queueing model, as illustrated in Fig. 5. Depending on the analyses being performed by the user, the unrealistic peaks for low values of l(t) may or may not be relevant.
The flowchart for generating the synthetic residential load model using the M t /G/C queue is presented in Fig. 6. The simulation time t is not necessarily the time an appliance will be served. The aggregated power usage from the appliances actual run time (i.e. when the appliance runs, not when it arrives in the queue), given by P h (t), is necessary because there is a power capacity and the appliance may need to wait in the queue, with a service policy of first come first served. The appliance, after being sampled from the set of appliances, will be served as soon as possible given the limitation of the power capacity. The time an appliance will be served t add is searched in the internal loop where δ is the simulation time resolution, hence the load will never be greater than C.

M t /G/C t queueing load model
The M t /G/C addressed the issue of unfeasible peak load consumption given the physical limitations of a system, but it may have unrealistic peaks for low values of l(t) given the expected customer behaviour, as illustrated in Fig. 5. The M t /G/C t queueing load model addresses the issue of unrealistic peaks for low values of l(t) by replacing the constant power capacity C with a timevarying power capacity C t . The time-varying power capacity, C t , can be defined from (4) or any user-defined time-varying curve. In this work, we assumed all customers have the same C t , k C , and scaled l(t) according to (2) and (3), but this can be generalised to a time-varying k C . The time-varying C t is calculated as With the limitation of the power capacity M t /G/C t , unfeasible and unrealistic peak load consumption given the physical limitations of a system and the expected customer behaviour are no longer created, as shown in Fig. 5. The procedure for the M t /G/C t queueing load model is the same as the flowchart in Fig. 6, except C is replaced with C t , thus the power capacity is computed with (4) and the internal loop condition is replaced by 3 Appliance model

Generic appliance model
The queueing model used to generate the synthetic load data is comprised of arriving appliances, therefore it is necessary to consider the assumptions of the appliances. In Sections 2.2-2.4, the synthetic queueing load models randomly sample a set of appliances ψ. Appliances are studied in [20] presenting the electric power consumption of household appliances, and the data is available online. The appliance model in Fig. 7a presents a generic appliance as a block of energy, having constant-power draw over a given time duration. In Fig. 7b, the output of the generic constantpower draw model is compared to a washing machine with timevarying power consumption from [20] with one-minute resolution. The generic appliance model consumes the same amount of energy as the actual appliance, just at different rates throughout the appliance duration. To validate this approximation, we compare the energy consumption of the two models through the appliance duration, Fig. 7c. Although the two models rarely consume the same power at any given time, the total energy consumed is the same and is relatively close throughout the duration. Therefore, for energy management simulation purposes, the generic constant load model is considered adequate and will be used in this paper. Additionally, the output of the M t /G/∞ queueing load model was shown to work with the real appliance power in [15] (i.e. due to G in the queueing models). The models presented in this paper will work with any set of appliances.
To generate the appliance power and duration, two gamma distributions are randomly sampled. Gamma distributions are continuous probability distributions in the positive real number set (a useful characteristic given the appliance power and duration must be positive) defined by two parameters (i.e. shape k and scale θ). The mean of a gamma distribution is E X = kθ, and the variance is Var(X) = kθ 2 . Thus, by defining the mean μ and standard deviation σ, the gamma parameters k and θ are computed with k = μ 2 σ 2 and θ = σ 2 μ. Fig. 8 illustrates the two gamma distributions sampled that determine the power and duration of the appliances. The gamma distributions are made from the expected mean and standard deviation of power in W and the mean and standard deviation of the duration in time.
The Arrival list (i.e. the output) contains the characteristics of all appliance power ratings in W and the time duration in hours, for each and every appliance that has arrived in the queue. Each element in Arrival represents a single appliance. Every row in Arrival represents an appliance i with its characteristics. Thus, Arrival contains all the arriving appliances for a residential customer for the generated simulation period (i.e. from T to T). The appliance model can possess more characteristics depending on the user-defined study of interest without significant changes to the model, further discussed in Sections 3.3 and 3.4.

Overview of appliance model variations
The synthetic residential queueing load models presented are considerably flexible. By making small changes to the appliance and load inputs, the ability of the models to address a wide range of researcher-specific projects can be achieved. Section 3.3 incorporates ZIP polynomial appliance characteristics and reactive power to the generic appliance model (i.e. appliance consumption has active and reactive power that are dependent on the local voltage of the customer) to be used in distribution voltage control studies. Scheduling characteristics of appliances are provided in Section 3.4 to be used in energy management studies. In Section 3.5, it is demonstrated how the reference curve l(t) can be altered so that a defined portion of the customer load is non-arriving, allowing other residential energy devices to be modelled that do not behave as the generic appliance model (e.g. HVAC, batteries, electric vehicles).

ZIP appliance load model
Loads in a low voltage (LV) residential distribution network have a dependency on voltage [21][22][23]. Distributed generation (DG) presents technical challenges in LV distribution networks. The voltage must be maintained within a predefined range; thus the power consumption of appliances and generation from DG in LV networks interact indirectly through voltage. Studies that analyse DG in LV distribution system networks or microgrids could make use of the presented queueing load models (e.g. [24][25][26][27]). The queueing load models generate distinct load consumption patterns for every residential customer, allowing impact assessment of the change in load on the change in local voltage. Considerable changes in appliances have occurred in recent years due to advances in power electronics, leading to a change in load characteristics. ZIP load models have been used to characterise the load dynamics with respect to voltage [22,23]. In New York City, a study was conducted to characterise the effects of voltage variations in load consumption with field validation [22] with the intention of energy conservation using Volt/var control at the substation level [21].
ZIP load models are flexible, the parameters are easily changed to better represent load dynamics, and reduce to other load models (e.g. constant active power, constant active and reactive power, constant resistance, constant impedance). ZIP is a static representation of load models, and assumes that the static characteristics of the active power of a load can be defined by three components: constant impedance (Z p ), constant current (I p ), and constant power (P p ), represented by (6). Similarly, reactive power dynamics can be obtained by (7) using the parameters Z q , I q , and P q . In (6) and (7), V is the local voltage, V 0 is the nominal voltage, P 0 is the reference active power at the nominal voltage, and Q 0 is the reference reactive power at the nominal voltage. Load models that behave as constant resistance and/or impedance have a quadratic dependence on voltage change, where load models that behave as constant current have a linear dependence on any change in voltage The ZIP coefficients have the following two constraints: Z q + I q + P q = 1 (9)

Appliance scheduling characteristics
Energy management benchmarks as in [4,9,10,12,28] schedule appliances in a time window to achieve a goal (e.g. reduce cost of energy). In a similar method to the ZIP characteristics, scheduling parameters can be added for the arriving appliances from the queueing models to allow the synthetic load model presented in this paper to generate inputs for the studies in [4,9,10,12,28] and more energy management studies.
To create the scheduling characteristics for each appliance, it needs to be determined which appliances are schedulable, and the scheduling constraints (e.g. start and end time of the scheduling window). Therefore, the appliances generated by the queueing models require more inputs to define these characteristics. One method for determining if each appliance is schedulable is to generate a random sample and compared it to a user-defined threshold, which determines the percentage of appliances that should be schedulable. To specify the scheduling constraints, two gamma distributions are created and sampled to specify the start of the scheduling window SW start , and the end of the scheduling window SW end (i.e. the time after the arrival plus the duration of the appliance), as illustrated in Fig. 9. The scheduling characteristics could be further extended, depending on the user's study of interest.

Non-arriving loads
Residential customers have more electric energy devices than just appliances modelled by the synthetic queueing load models. Portions of the customer load profile possess climate dependencies, such as HVAC and electric water heaters. In the proposed queueing models, these are modelled as a conjunction of non-schedulable appliances rather than containing their climate dependencies. As the energy consumption of such thermal loads changes based on use and climate, the energy is not able to be directly shifted to a more convenient time (i.e. preheating or cooling a home does not imply that the same amount of energy would be used at a later time). Most studies that consider thermal loads take into account comfort not to violate the comfort of the inhabitants [10,12], thus proper thermal models must be used and need to have their energy separated from conventional appliances and the presented queueing models.
Furthermore, other electric energy devices (e.g. electric vehicles) have energy requirements dependent on usage, and are also not always available at the residence. The use of the large battery from electric vehicles in DR is appealing, but the characteristics of such a resource require different considerations than the appliance model.
The presented queueing load models in Section 2 can be used with a small adjustment to consider electric energy devices with different dependencies, characteristics, and behaviour. Nonarriving appliances (e.g. HVAC) should be removed from l(t) prior to use in the queueing process. At time t, let B(t) be the expected non-appliance load, and B l (t) be the expected household appliance load. If modelling non-appliance load, B l (t) is used in place of l(t) for the queueing load models, and is defined as Fig. 10 demonstrates the removal of the non-arriving appliances from l(t), thus maintaining the aggregated behaviour of the customers. The sum of the generated appliances from B l (t) in the queueing load models plus B(t) from all customers has the same proportions to the sum of the original load l(t) from all the customers. This allows researchers the ability to model any electric energy device in conjunction with the synthetic queueing load models without losing the aggregation property to the known input load curve.

Inputs to the queueing load models
There are two input groups for all the queueing models: the publicly available aggregate distribution system load data and the appliance parameters. The historic data obtained from distribution companies is input as time-series C L (t) from (2). The residential building load curve l(t) is generated with the user-defined choice of b min and b max (i.e. minimum and maximum expected residential load characteristics). The appliances arriving into the queue are created as presented in Section 3.1, thus generating the input set of appliances ψ. To generate ψ, the following parameters must be chosen by the user: • Number of appliances (i.e. size of the set ψ); • Standard deviation and mean power of appliances (i.e. inputs to the gamma distribution). The selection has a direct impact in the expected power of the set E P (y-axis of Fig. 7a); and • Standard deviation and mean duration of appliances (i.e. inputs to the duration gamma distribution). The selection has a direct impact on the expected duration of the set E D (x-axis of Fig. 7a).
Thus, the selection of the gamma distributions to generate the appliances will impact the arrival rate of appliances because (1) is dependent on E P and E D . Notice that the chosen mean of the gamma distribution is not used in (1), but rather E P and E D from the set of generated appliances ψ. The larger the number of appliances in ψ, the closer these values will approximate the gamma distribution. In this paper, the validation of the models presented in Section 5 makes use of the same input parameters unless stated otherwise. Historical data from ComEd [29] (the utility company for Chicago, IL) is utilised as the C L (t) input for the M t /G/∞, M t /G/C, and M t /G/C t queueing load models. The period utilised from ComEd is the year of 2014, having b min = 500, b max = 5000, and k C = 2. The appliance set ψ is generated, as illustrated in Fig. 7, having the gamma distribution parameters as power (W) μ = 500 and σ = 100, and appliance duration (hour) μ = 0.5 and σ = 0.25.

ZIP appliance model input
The ZIP coefficients for residential, commercial, and industrial loads can be estimated using field and/or experimental data, as shown in [22,23]. In [23], 29 appliances have their ZIP parameters modelled; Table 2 presents three example appliances. Furthermore, the appliances also present the number of tested equipment, cutoff  voltage, nominal voltage, and active and reactive power at nominal voltage. Thus, one of the inputs to generate the appliances that incorporate the ZIP polynomial parameters is the complete set of appliances from [23].
In Table 2, the representation of ZIP polynomial parameters does not contain the contribution of each appliance. To randomly generate a set of customer appliances, as explained in Section 3.1, that maintain the aggregated behaviour of the system the contributions of each appliance must be maintained. In [23], the contribution of each appliance is characterised. As the use of appliances changes throughout the year, Table 3 presents the summary of appliance contributions normalised by season. From [23], fall and spring are expected to possess the same behaviour. The complete table with the contribution of appliances for residential customer is presented in [23].
The creation of the set of appliances follows the same process as presented in Section 3.1. Still, more characteristics than the appliance power (i.e. P 0 in the ZIP model) and time duration are needed. When an appliance arrives into the queue, another sample is made to define Q 0 and the ZIP polynomial coefficients. A weighted sample is made on the set of appliances given the overall contribution in the particular season being generated (i.e. from Table 3). From the random sample, the ZIP polynomial coefficients are used directly, and Q 0 of the appliance is defined to maintain its power factor. As the contribution from each appliance changes with the season, the queueing load models must change the set of arriving appliances ψ at every change of season. The results in this paper use the data from [23], but this can easily be altered by the user for the study in question with no loss of generality.

Comparing the three synthetic queueing load models
The three queueing load models are compared in detail for two days containing the minimum and peak hours from the ComEd region in 2014 (May 25, 2014, and July 22, 2014, respectively) in Fig. 11. Each queueing model was used to generate 1000 customers. The top row of Fig. 11 presents the minimum load day, while the bottom row presents the peak day. The three columns from Fig. 11 represent the M t /G/∞, M t /G/C, and M t /G/C t , respectively. In each plot, the dashed purple line is the user-defined expected load curve of a single customer l(t), from (2) with b min = 500 and b max = 5000. The first to third quartiles are represented by the dark shaded blue area, and the minimum and maximum of the 1000 customers are represented by the light shaded blue area. The first quartile of the 1000 customer loads splits the lowest 25% of the customer load data from the highest 75%. Similarly, the third quartile splits the highest 25% of the load data, hence the first to third quartile range represents the active power region where 50% of the customers are located (please note that a specific customer may move in and out of this region from one time period to the next). Fig. 11 presents the mean load value of the 1000 customers from the output of the queueing models as the solid black line, which follows the input reference curve l(t), showing that independently generated loads for each customer have distinct behaviour that, on average, follows the known reference load.
For the M t /G/C and M t /G/C t queueing load models, the red dotted line in Fig. 11 represents the power capacity of a single home (i.e. C and C t , respectively). The power capacities C and C t were computed with (3) and (4), respectively, having k C = 2. The power capacity for the M t /G/C queue does not appear in the minimum power day because C = 10 kW is out of the y-axis range. The M t /G/∞ is unbounded, thus not having a power capacity. In Fig. 11, it can be observed that only M t /G/C t is affected by the power capacity for the minimum load day (i.e. the top row). Because the M t /G/C t active power range for the minimum load day is considerably reduced, there is a low likelihood for the arrival of appliances to be served during the low-capacity time period. This characteristic is elaborated in Section 5.2.
The mean of the 1000 customers follows the reference load l(t), showing that the average behaviour of the independently generated customer loads is known, as presented in Fig. 11. This characteristic can be extrapolated to generate any given number of independent customers N to a known aggregated load curve C L * (t) with For a large number of N, C L * (t) can be used for large-scale smart city-sized energy management studies. In Fig. 12, the aggregate behaviour of all 1000 customers is shown. The output of each of the three models is compared to the sum of the reference curves, C L * (t). As each of the three models closely approximates C L * (t), the proposed models are validated for use in large-scale smart citysized energy management studies. This demonstrates that even with the differences in the individual output of the queueing models for every customer -i.e. random process and independently generated using (11) -(i) the average behaviour of a customer is known, (ii) the aggregate behaviour of all customers represents a  known system curve, and (iii) each customer is independent with a unique load curve.

Impact of queueing model parameter choice
The choice of input parameters impacts the output of the queueing load models. Section 2.2 demonstrates that E P and E D have a direct impact on λ(t) (i.e. the arrival rate of appliances) according to (1). Thus, the gamma distribution for generating the appliances, as presented in Section 3.1, has an impact on λ(t). Additionally, the selection of appliance parameters impacts the size of the output list of appliances generated by the queueing models. A small value of E P and E D requires many more appliances to arrive for the same reference values, and vice versa. For the same input parameters used in Section 5.1, if b min is changed to 100 W, the queueing load model behaves differently for the minimum hour day, as shown in Fig. 13. For the M t /G/∞ queueing model, the choice of b min impacts the arrival rate of appliances. Based on (1) and (2), with an l(t) = 100 W at the minimum time and E P ≃ 500 W (depending on the sampling of the gamma distribution for ψ), the arrival rate λ(t) significantly reduces (i.e. the period between arrivals is increased), as shown in the left plot of Fig. 13.
For the M t /G/C t queueing model, because E P of the appliance set is ∼5 times larger than the reference curve at hour 7:00, the probability of a customer having an appliance smaller or equal to 100 W given the gamma distribution parameters are μ = 500 and σ = 100 is approximately zero. Thus, considering the probability of such appliances arriving at a time in which they could be scheduled is negligible. The first to third quartile range becomes non-existent for a small period of time. The M t /G/C t possesses a power capacity of 200 W for the hour, thus for all 1000 customers, no appliances are present that can be served, making the load equal to zero. Because the service policy is first come first served, the arrived appliances are started as soon as possible in the subsequent hours, the mean value tends towards the power capacity and it takes a few hours to return to steady-state behaviour.
Maintaining the same parameters used in Section 5.1 except with b min still equal to 100 W, we significantly reduced E P and E D (i.e. appliance power (W) μ = 10 and σ = 2, and appliance duration (hour) μ = 0.2 and σ = 0.1) to study the impact of small appliances in the queueing models, as shown in Fig. 14. The mean, complete range, and first to third quartile range behave as expected for M t /G/∞ and M t /G/C t , however, the generated ranges are considerably smaller, i.e. significantly reducing the difference between the individual customers.
In summary, for the output of the queueing models to be stable, it is desirable that multiple appliances can arrive and be served at any moment (i.e. Fig. 13 at 7:00). At the same time, if a large number of small appliances arrive and are served at the same time, the differences between the customers is greatly reduced as shown in Fig. 14. The choice of the gamma distribution parameters for generating the set of appliances ψ, the size of the set of appliances ψ , b min , b max , and queueing model power capacity has an impact on the behaviour of the queueing load models, and should be considered by the user when choosing the input parameters for future studies using these models. These choices will vary depending on the specific user problem of interest (e.g. HEMS, geographical area, and other relevant input for a specific study). The input variables b min and b max appear to be independent, but if it is desirable for the aggregated behaviour of the customers to approximate C L (t), b min and b max are not independent. For the aggregated load of customers to behave as C L (t), the ratio of max (C L ) and min (C L ) needs to be maintained in b max and b min , and b max must be an integer multiple of max (C L ) as the number of customers is an integer. Thus, with a known number of customers and b max , it is known that b min = min (C L ) max (C L )b max .

Queueing model discussion
This subsection presents a comparative discussion of the results presented in Sections 5.1 and 5.2, summarising the behaviour of the queueing models and their interaction with the appliance models. Residential customers naturally follow a predictable behaviour when aggregated, thus the average load approximates to the reference (i.e. scaled system load curve). However, an individual residential customers is expected to possess large variability through time (i.e. a single load in a house greatly impacts its power consumption, but the same load is quite small compared to the system curve). The ranges of generated values of the thousand residential customers are used to demonstrate the behaviour of the generated loads on the valley-and peak-hour days in Table 4. The mean of the generated range from the first to the third quartile and the complete range for the generated synthetic residential load models are presented, showing the impact of the load model and choice of parameters in the model output. Cases 1, 2, and 3 in Table 4 refer to the presented cases in Sections 5.1 and 5.2. Case 1 is presented in Section 5.1 has load scaling parameters k C = 2, b min = 500 (W), and b max = 5 (kW); appliance duration μ = 0.5 (hour), σ = 0.25 (hour); and appliance power μ = 500 (W), and σ = 100 (W). Case 2 presented in Section 5.2 has load scaling parameters k C = 2, b max = 5 (kW), and b min = 100 (W); and the same appliance parameters as Case 1. Finally, Case 3 presented in Section 5.2 has the same load scaling parameters as Case 2; appliance duration μ = 0.2 (hour), σ = 0.1 (hour); and appliance power μ = 10 (W), and σ = 2 (W). In Table 4, the queueing load model time-varying server availability and appliance parameters have a similar impact on the complete ranges in Case 1, however, not on the first to third quartile. Reducing the appliance size will impact the complete and first to third quartile ranges in Case 3. Note we only present the valley-hour day for Cases 2 and 3 because these were used to highlight the impact of appliance size and scale parameters of the reference load on the model output in Section 5.2, hence there is no data presented for the peak-hour day.

ZIP appliances with M t /G/C t
As seen in Section 3.3, the appliance load consumption is affected by the local voltage. A common appliance model that considers this behaviour is the ZIP load model. ZIP load model characteristics were added to the appliances, as shown in Section 4.2 with parameters from [23]. To demonstrate the behaviour of the appliances with the ZIP parameters, 50 homes were generated making use of the M t /G/C t queueing load model with the same parameters from Section 5.1. Appliances arrive into the queue with their individual characteristics (i.e. d i , P 0 , Q 0 , Z p , I p , P p , Z q , I q , P q ). Thus, to visualise the behaviour of the home, the appliances operating simultaneously are aggregated as where ( . ) = {Z, I, P}. Although 50 homes were visualised and analysed, the output of each home was similar, so a single home was chosen to illustrate the average characteristics of the ZIP parameters for the queueing load models. Fig. 15 presents the active and reactive power of the selected home for July 22, 2014, using the M t /G/C t queueing model assuming a variation in voltage at the point of connection to the electric power system. The top plot in Fig. 15 presents the reference curve for the queueing model (black dashed line), active power of the home at nominal voltage (solid black line), queueing model power capacity (red dotted line), and the range the active power (green shaded area) assuming a voltage range from 0.95 to 1.05 p.u. [30]. The bottom plot in Fig. 15 presents the reactive power at nominal voltage (solid black line) and the range of reactive power for the same variation in voltage (blue shaded area). The reactive power of appliances is dependent on the appliance model (i.e. user-defined ZIP characteristics and power factor), not on the queueing model which only governs the arrival of appliances based on the active power (hence the active power reference curve and a power capacity for M t /G/C t ).

Computational performance of the synthetic queueing load models
The synthetic queueing load models are intended for power system studies for large-scale smart city-sized assessments, which contain thousands to millions of electric customers. Thus, any synthetic load generation approach for residential city-size studies must be computationally efficient. As shown in the previous sections, the presented synthetic queueing load models are capable of being independently generated for each customer; thus they can be created in parallel with minimal interprocess communication.  Operations in Python) package to spread the customer load generation work across the available compute resources in single and/or multiple compute nodes. SCOOP maintains a master processor to manage and monitor the work in a worker-pool model, thus when a process finishes generating and saving the data for one customer it is assigned another customer from the work pool until all customer loads have been generated. The data is saved using HDF5 file format and written using the cluster parallel file system that enables concurrent read/write to disk. The scalability of the queueing load models are presented in Fig. 16 for generating 100 annual customer loads for the year 2014 using the different queueing load models averaged over four trials. The y-axis presents the speedup normalised to the runtime for each algorithm using 26 processing elements and compared to the ideal speedup (i.e. linear speedup with each additional processing element). The relative performance of the methods differs due to the internal loop of M t /G/C and M t /G/C t in Fig. 6 for shifting the arriving appliances depending on the available power capacity. The scalability of the three queueing models behaves similarly through 76 processors, but the three methods deviate in performance with 101 processors (i.e. 1 processor per customer and 1 master process). Because of the internal loop, if each process only generates one customer load, the single slowest customer load sets the entire runtime and negatively impacts performance compared to M t /G/∞. The time requirements of the internal loop of M t /G/C t are more demanding (i.e. stricter power capacity), thus having a larger impact on relative performance and scalability.
The absolute times required to generate the data are presented in Table 5, which are averaged over four trials for each case. The individual customer time is computed by multiplying the total time by the number of processing elements and dividing the result by the number of generated customers. Notice that the average time for an individual customer has a little variation with the same queueing load model across differing numbers of processing elements. As the time to create a one-year dataset per customer is relatively low and the data is able to be generated independently in parallel with near-linear speedup, the synthetic load models are promising to generate smart city-sized datasets that aggregate to a known system load curve.

Conclusions
This paper proposed an approach for modelling individual residential customers and their individual electric assets using time-varying queueing models. The queueing load models presented in this paper address the challenges of unavailability and proprietary customer data by using only public available aggregated load data for a region, allowing researchers to replicate results in many studies and compared their methods to the state-ofthe-art. Also, by aggregating to a known system load curve, the economic and technical impacts of new research methods can be better evaluated. The model assumes that the aggregated distribution system behaviour is known while including the stochastic nature of individual customers and their electric assets (i.e. combined top-down, bottom-up modelling). The models are general enough to incorporate other characteristics, such as nonarriving portions of customer loads (e.g. HVAC), voltage dependencies (e.g. ZIP polynomial coefficients), scheduling characteristics, and more depending on the needs of the individual researcher. The models were validated by visualising the differences in output between a thousand customers and by their aggregated load that characterises and follows a known system curve. As the proposed models were shown to scale in a near-linear fashion and individual customer loads can be independently generated, the methods can be used in large-scale demand-side management studies (e.g. Smart City DR) with individual customer load data that maintains the time-varying characteristics of an actual power system region. The future work is to expand the appliance models to include HVAC and other characteristics such as frequency dependencies.

Appendix
The algorithm in Fig. 17 demonstrates the process of generating the synthetic residential load model M t /G/∞, representing the information of Fig. 4 in algorithm form. Fig. 18 demonstrates the algorithm process of generating the synthetic residential load model M t /G/C (i.e. Fig. 6). The initialization of variables, input, and data from Fig. 17 is the same, thus being summarized in Line 1. Note that the procedure for the M t /G/C t queueing load model is the same as Fig. 18, except C is replaced with C t , thus the power capacity computed with (4) and the internal loop condition replaced by 5 (i.e., Line 10).