Volume 17, Issue 4 p. 554-565
ORIGINAL RESEARCH
Open Access

Gravitational search algorithm-extreme learning machine for COVID-19 active cases forecasting

Boyu Huang

Boyu Huang

Department of Computer Science, Shantou University, Shantou, China

Contribution: Methodology, Software

Search for more papers by this author
Youyi Song

Youyi Song

Centre for Smart Health, School of Nursing, The Hong Kong Polytechnic University, Hong Kong, China

Contribution: Formal analysis

Search for more papers by this author
Zhihan Cui

Zhihan Cui

Department of Computer Science, Shantou University, Shantou, China

Contribution: Validation

Search for more papers by this author
Haowen Dou

Haowen Dou

Department of Computer Science, Shantou University, Shantou, China

Contribution: Data curation

Search for more papers by this author
Dazhi Jiang

Dazhi Jiang

Department of Computer Science, Shantou University, Shantou, China

Contribution: Writing - review & editing

Search for more papers by this author
Teng Zhou

Corresponding Author

Teng Zhou

Centre for Smart Health, School of Nursing, The Hong Kong Polytechnic University, Hong Kong, China

School of Cyberspace Security, Hainan University, Haikou, China

Correspondence

Teng Zhou and Jing Qin.

Email: [email protected] and [email protected]

Contribution: Funding acquisition, Project administration

Search for more papers by this author
Jing Qin

Corresponding Author

Jing Qin

Centre for Smart Health, School of Nursing, The Hong Kong Polytechnic University, Hong Kong, China

Correspondence

Teng Zhou and Jing Qin.

Email: [email protected] and [email protected]

Contribution: Supervision

Search for more papers by this author
First published: 11 July 2023

Abstract

Corona Virus disease 2019 (COVID-19) has shattered people's daily lives and is spreading rapidly across the globe. Existing non-pharmaceutical intervention solutions often require timely and precise selection of small areas of people for containment or even isolation. Although such containment has been successful in stopping or mitigating the spread of COVID-19 in some countries, it has been criticized as inefficient or ineffective, because of the time-delayed and sophisticated nature of the statistics on determining cases. To address these concerns, we propose a GSA-ELM model based on a gravitational search algorithm to forecast the global number of active cases of COVID-19. The model employs the gravitational search algorithm, which utilises the gravitational law between two particles to guide the motion of each particle to optimise the search for the global optimal solution, and utilises an extreme learning machine to address the effects of nonlinearity in the number of active cases. Extensive experiments are conducted on the statistical COVID-19 dataset from Johns Hopkins University, the MAPE of the authors’ model is 7.79%, which corroborates the superiority of the model to state-of-the-art methods.

1 INTRODUCTION

An outbreak of unexplained pneumonia in Wuhan, Hubei Province, China, in early December 2019, has been confirmed as an acute respiratory infection caused by a novel 2019 coronavirus infection [1-3]. Although the source of COVID-19 transmission has not been determined, Zhou et al. [4] pointed out the possibility that the virus was transmitted from bats to humans. Based on the assessment, the World Health Organisation (WHO) believes that the current outbreak of new coronavirus pneumonia can be described as a global pandemic [5]. With the global spread of COVID-19, many countries have declared a state of emergency prevention and control, and have introduced cross-border travel restrictions one after another, which has caused a large number of deaths and significant economic devastation. As countries assess the severity of the outbreak in their regions mainly based on the number of active cases, accurate prediction of active COVID-19 cases is an essential technique for deciding on outbreak prevention and control measures to reduce the number of COVID-19 infections. The quantity of active cases is not only influenced by the randomness of infection between virus carriers and normal individuals, but also displays seasonal variations [6]. Owing to the variability and stochasticity of infection between individuals [7, 8], it is still imperative to construct a model that can accurately forecast.

The virus is transmitted from person to person primarily through respiratory droplets [7-9] and causes a range of symptoms and severe sequelae [10-12]. Nevertheless, the exact virological and epidemiological characteristics of this third zoonotic coronavirus, including transmissibility and mortality, are not known. Deep learning has the ability to learn and model nonlinear complex relationships which has received interest and attention in various fields [13-16]. Nevertheless, with such a significant number of active cases worldwide, training deep learning models would be time-consuming and vulnerable to overfitting [17]. Therefore, we need to formulate a model that can solve these problems properly to forecast COVID-19 active cases.

Extreme learning machine (ELM) is a feedforward neural network first proposed by Huang et al. [18] in 2006. It is illustrated that ELM has excellent generalisation performance as well as extremely fast learning ability without gradient-based backpropagation to adjust the weights, and instead sets the weights by the Moore-Penrose (MP) generalised inverse, so it can well overcome the problem of difficult training with large amount of data [19, 20]. It is further proved that if the activation function of the hidden layer is infinitely differentiable on any interval, the input weight and the hidden layer threshold can be set randomly before training and remain unchanged during the training. ELM is currently applied not only to regression and fitting problems [21, 22], but also to classification [23], pattern recognition [24] and other fields. At the same time, a variety of improved methods and strategies have been mentioned [25, 26], so that the performance of ELM has also been greatly improved, and its importance is increasingly reflected.

Nevertheless, ELM generates the weight matrix of the input layer and the hidden layer bias values by initializing them in a random way. These parameter setting situations play a significant role in the final prediction performance of the model. In addition, the nonlinear active cases data leads to the possibility that the model may suffer from performance degradation in the face of samples that do not appear during the training process due to the lack of generalisation ability. To address this limitation, the GSA-ELM algorithm proposed in this paper can solve the above problems. We utilise the Gravitational Search Algorithm (GSA) [27] to search for the most optimal weights and biases of the extreme learning machine in cases forecasting, referred to as GSA-ELM. This model not only reduces the complexity of the network to avoid overfitting, but also ensures the reliability and stability of the prediction results and improves the competitiveness relative to other methods. We summarise the principal contributions of this work as follows.
  • We optimise the weights and biases by gravitational search algorithm to improve the performance of the extreme learning machine.

  • We innovatively applied deep learning methods to the prediction of COVID-19 and were capable of successfully coping with the complexities of nonlinearity.

  • We evaluate our hybrid learning model on a benchmark set and compare it with several state-of-the-art machine learning forecasting models to demonstrate the superiority of our model.

The rest of the paper is structured as follows. The second part is the methodology, and the third part is an empirical study of real-world data from Johns Hopkins University statistics in the United States. Then comes the conclusion.

2 RELATED WORKS

Simulation of epidemics. Several epidemiological and clinical characterisation studies have been conducted on patients with this virus to analyse its biological features and viral pathogenesis [28-31], this will help medical practitioners to develop vaccines faster and effectively prevent the spread of this virus in the population. Anastassopoulou et al. [32] made a preliminary prediction of the evolution of the outbreak by means of data modeling. Susceptible Infected Susceptible (SIS) [33, 34], Susceptible Infected Recovered (SIR) [35] and Susceptible Exposed Infected Recovered (SEIR) [36, 37] models provide an alternative approach to epidemic simulation and many research works have been reported. The results show that those SIS, SIR and SEIR models can reflect the dynamics of different epidemics. Meanwhile, these models have been used in COVID-19 [38, 39].

Optimisation algorithms. There are a large number of many excellent optimisation schemes being applied to solve practical problems. Binary versions for RSO have not been created for binary optimisation problems. Awadallah et al. [40] proposed an enhanced binary version of the Rat Swarm Optimiser (RSO) [41] to handle the Feature Selection (FS) problem, and the amazing achievement proved the feasibility of the proposed RSO version. Thawkar et al. [42] proposed a hybrid feature selection method based on the Butterfly Optimisation Algorithm (BOA) [43] and Ant Lion Optimiser (ALO) [44] for breast cancer prediction, which effectively improved the optimisation and classification accuracy. To change the low exploration capability of the traditional Whale Optimisation Algorithm (WOA) [45], Chakraborty et al. [46] studied to provide mWOAPR. A novel variant version improves the exploration capability of the algorithm while balancing the global and local search functions, and successfully applied it to solve image segmentation problems. However, none of these studies have considered the predictive potential of the hybrid learning model in epidemiology, and this study is the first to apply the model to the task of predicting active cases of COVID-19.

Heuristic algorithms. Heuristic algorithms are proposed relative to optimisation algorithms, where the optimal algorithm for a problem seeks the optimal solution for each instance of that problem. At the present stage, heuristic algorithms are dominated by natural body-like algorithms, which have achieved great success. Typical works include Li et al. [47] which proposed the Slime Mold Algorithm (SMA) as an algorithm inspired by the biomotor behavior of simulated slime molds. Through studying the behavioral pattern of slime mold single cell growth and analysing the characteristics of its simulated behavior applying it to computer simulations can lead to optimised results. Tu et al. [48] proposed the Colony Predation Algorithm (CPA) following the strategy used by animal hunting groups, using success rate to adjust the strategy and simulate the selective abandonment behavior of hunting animals. It shows competitive, superior performance in different search environments. Then, the Harris Hawk Optimisation (HHO) algorithm designed by Heidari et al. [49] achieves population evolution through mathematical modeling of different predation strategies of Harris hawks, with a strong algorithm for finding superiority and without tedious tuning of parameters.

3 METHODOLOGY

In this section, we first construct the extreme learning machine (ELM) to predict the quantity of active cases. Then, we utilise the gravitational search algorithm to globally optimise the combination of parameters for the ELM.

3.1 Extreme learning machine

Extreme Learning Machine (ELM) is a novel fast learning algorithm in the neural network structure, which is a forward propagating neural network. For traditional neural networks, especially single hidden layer feedforward neural networks (SLFNs), ELM can reduce the amount of model operations by randomly initialising the input weights and biases [50] and no longer needing to adjust them after they are set. In addition, the connection weights between the implicit and output layers do not need to be adjusted iteratively, but instead are determined by solving a system of equations. The experiments in [51] prove that the algorithm not only has high generalisation ability to guard against overfitting, but also can outperform traditional machine learning algorithms while guaranteeing learning accuracy. A three-layer structure of ELM is demonstrated in Figure 1.

Details are in the caption following the image

Extreme learning machine. The forward propagating neural network has only one hidden layer, whose parameters include input weights ω, output weights β, and hidden layer biases (b).

An overview of the primary philosophy of ELM is provided below. For ELM with a three-layer structure, suppose there are Z training samples x i , t i i = 1 Z ${\left({x}_{i},{t}_{i}\right)}_{i=1}^{Z}$ . Here, x i = x i 1 , x i 2 , , x i n R n ${x}_{i}={\left[{x}_{i1},{x}_{i2},\text{\ldots },{x}_{in}\right]}^{\top }\in {\mathbb{R}}^{n}$ and t i = t i 1 , t i 2 , , t i m R m ${t}_{i}={\left[{t}_{i1},{t}_{i2},\text{\ldots },{t}_{im}\right]}^{\top }\in {\mathbb{R}}^{m}$ represent the input data and the ground truth of the ith sample, respectively. This neural network with a single hidden layer can be represented as:
j = 1 k β j g ω j x i + b j = t i , i = 1 , , Z , $\sum\limits _{j=1}^{k}{\beta }_{j}g\left({\omega }_{j}^{\top }\cdot {x}_{i}+{b}_{j}\right)={t}_{i},i=1,\text{\ldots },Z,$ (1)
where ω j = ω j 1 , ω j 2 , , ω j k ${\omega }_{j}={\left[{\omega }_{j1},{\omega }_{j2},\text{\ldots },{\omega }_{jk}\right]}^{\top }$ is the input weight vector that links the input nodes to the jth hidden neuron and k is the number of neurons in the hidden layer. g(x) is the activation function of the hidden layer and bj denotes the bias of the jth hidden neuron. We suppose h i j = g ω j x i + b j ${h}_{ij}=g\left({\omega }_{j}^{\top }\cdot {x}_{i}+{b}_{j}\right)$ , where ω j x i ${\omega }_{j}^{\top }\cdot {x}_{i}$ is the inner product of ω j ${\omega }_{j}^{\top }$ and xi, representing the input value of the jth hidden neuron. Further express Equation (1) in matrix format as:
H β = T , $H\beta =T,$ (2)
where H = h i j i = 1 , , Z , j = 1 , , k $H={\left\{{h}_{ij}\right\}}_{i=1,\text{\ldots },Z,j=1,\text{\ldots },k}$ indicates the output matrix of the implied layer. The matrix β = β 1 , β 2 , , β k $\beta ={\left[{\beta }_{1},{\beta }_{2},\text{\ldots },{\beta }_{k}\right]}^{\top }$ represents the output weights of this network, in which β j = β j 1 , β j 2 , , β j m R m , j = 1 , , k ${\beta }_{j}={\left[{\beta }_{j1},{\beta }_{j2},\text{\ldots },{\beta }_{jm}\right]}^{\top }\in {\mathbb{R}}^{m},j=1,\text{\ldots },k$ is the weight vector connecting the jth hidden neuron to the output nodes. T = t 1 , t 2 , , t Z $T={\left[{t}_{1},{t}_{2},\text{\ldots },{t}_{Z}\right]}^{\top }$ represents the expected output of the network.
In the ELM algorithm, once the input weights and the bias values of the hidden layers are determined randomly, the matrix H is also identified, and training a single hidden layer neural network can be converted into a problem of solving a linear system. The output weight matrix can be obtained by the following equation:
β ˆ = H T , $\widehat{\beta }={H}^{{\dagger}}T,$ (3)
where H is derived by solving the Moore-Penrose (MP) generalised inverse by H.

3.2 Hybrid GSA-ELM algorithm

ELM inevitably has drawbacks in the learning process. The random selection of its parameters leads to the generation of a series of non-optimal parameters, and these parameter setting situations play an important role in the final prediction performance of the model. This makes the number of required implicit layer nodes more than the traditional learning algorithm, which affects its generalisation performance and leads to the pathological state of the system. Only the information of the input parameters is used in the learning process for computation, while the actual output values, which are very valuable, are ignored. In addition, the nonlinear active case data leads to a possible performance degradation of the model in the face of samples that do not appear during the training process due to the lack of generalisation capability. The accuracy obtained by applying it to the COVID-19 active case prediction does not satisfy the real situation. Therefore, we propose here to use the gravitational search algorithm to search for the internal network parameters that are most suitable for ELM to predict the number of active cases of COVID-19, thus improving the overall performance of the model. The overall architecture of the GSA-ELM hybrid model is shown in Figure 2. As can be seen, we first initialise the parameters of the GSA and the particle positions of the population and evaluate the fitness value of each particle. Next, we calculate the interaction forces between the particles to update the velocity and position of each particle. The algorithm is iterated several times until the termination condition is satisfied. At this point, the global optimal solution of the problem returned by the GSA is applied to the parameter settings of the ELM. Finally, the ELM is used to generate the final prediction of our model for the number of future active cases. Next we present the details of the gravitational search algorithm as follows.

Details are in the caption following the image

The general structure of our proposed GSA-ELM model for COVID-19 active cases forecasting.

In 2009, Rashedi et al. [27] proposed a novel optimisation algorithm, the Gravitational Search Algorithm (GSA), based on the law of gravity and the interaction between particles. The mass of the individual measures the merit of the individual, and the better the position, the greater the mass. Due to the effect of gravity, individuals are attracted to each other and move in the direction of the individual with the largest quality. As the movement continues, ultimately the whole group will gather around the individual with the largest mass, thereby finding the individual with the largest mass, which occupies the best position. Hence, the algorithm is able to obtain the optimal solution to the problem. In the model, we consider a search space with N particles. We define the position of the rth particle as follows:
M r = μ r 1 , , μ r d , , μ r D , r = 1 , 2 , , N , ${\mathcal{M}}_{r}=\left({\mu }_{r}^{1},\text{\ldots },{\mu }_{r}^{d},\text{\ldots },{\mu }_{r}^{D}\right),r=1,2,\text{\ldots },N,$ (4)
where μ r d ${\mu }_{r}^{d}$ represents the position of the rth particle in the dth dimension. The dimension of each particle is D = k (n + 1), k and n are the number of neurons in the hidden and input layers in ELM, respectively. Similarly we define the velocity of the rth particle among N particles as:
V r = v r 1 , , v r d , , v r D , r = 1 , 2 , , N , ${\mathcal{V}}_{r}=\left({v}_{r}^{1},\text{\ldots },{v}_{r}^{d},\text{\ldots },{v}_{r}^{D}\right),r=1,2,\text{\ldots },N,$ (5)
where v r d ${v}_{r}^{d}$ denotes the velocity of the rth particle in the dth dimension.
Based on the principle of force interaction, in the ith iteration we define the force acting on the rth particle by the sth particle as follows:
F r s d = G ( i ) M r ( i ) × M s ( i ) R r s ( i ) + σ μ s d ( i ) μ r d ( i ) , ${F}_{rs}^{d}=G(i)\frac{{M}_{r}(i)\times {M}_{s}(i)}{{R}_{rs}(i)+\sigma }\left({\mu }_{s}^{d}(i)-{\mu }_{r}^{d}(i)\right),$ (6)
where Rrs(i) denotes the Euclidean distance between the particles r and s. Mr and Ms are respectively the passive gravitational mass of particle r and the active gravitational mass of particle s. σ is a small constant in order to prevent the denominator turning into zero. G(i) denotes the gravitational constant, whose value decreases with each iteration of the algorithm to control the search accuracy. The specific representation of G(i) is as follows:
G ( i ) = G 0 e ϵ i I , $G(i)={G}_{0}{e}^{-{\epsilon}\left(\tfrac{i}{I}\right)},$ (7)
G0 is the initial value of the gravitational constant G(i), I is the overall number of iterations of the algorithm and ϵ is the constant which requires adjustment.
To introduce the complex property in the algorithm, assume that the total force acting on the rth particle in the dth dimension is a random weighted sum of the forces exerted by the other particles, expressed as:
F r d = s = 1 , s r N r a n d s F r s d ( i ) , ${F}_{r}^{d}=\sum\limits _{s=1,s\ne r}^{N}ran{d}_{s}{F}_{rs}^{d}(i),$ (8)
where rands ∈ (0, 1) represents a random variable that follows a uniform distribution, and to add the random property.
The algorithm must start with a full exploration of the search space to prevent falling into a local optimum. As the iterations proceed, the search gradually decreases and the exploitation gradually increases. Also, to improve the performance of GSA by controlling the exploration and exploitation, a good compromise is to reduce the number of particles over time. Specifically, only δbest optimal particles can attract other particles, where δbest is a function of time and decreases linearly with the iteration process. Finally only one particle applies force on the other particles. Therefore, Equation 8 can be modified as follows:
F r d = s r , s δ best N r a n d s F r s d ( i ) , ${F}_{r}^{d}=\sum\limits _{s\ne r,s\in {\delta }_{\mathit{best}}}^{N}ran{d}_{s}{F}_{rs}^{d}(i),$ (9)
where δbest is the set of the first δ particles has the largest mass and the best fitness value. According to Newton's second law, the acceleration of particle r at the ith iteration and in the dth dimension, is defined as:
a r d ( i ) = F r d ( i ) M r ( i ) , ${a}_{r}^{d}(i)=\frac{{F}_{r}^{d}(i)}{{M}_{r}(i)},$ (10)
where Mr represents the inertial mass of rth particle. The next velocity of particle r is considered to be part of the current velocity plus its acceleration, which leads to the calculation of its velocity and position update by:
v r d ( i + 1 ) = r a n d r × v r d ( i ) + a r d ( i ) , ${v}_{r}^{d}(i+1)=ran{d}_{r}\times {v}_{r}^{d}(i)+{a}_{r}^{d}(i),$ (11)
μ r d ( i + 1 ) = μ r d ( i ) + v r d ( i + 1 ) , ${\mu }_{r}^{d}(i+1)={\mu }_{r}^{d}(i)+{v}_{r}^{d}(i+1),$ (12)
where v r d ( i + 1 ) ${v}_{r}^{d}(i+1)$ and μ r d ( i + 1 ) ${\mu }_{r}^{d}(i+1)$ indicate the velocity and position of particle r after the (i + 1)th iteration update, respectively. randr ∈ (0, 1) is also a random variable obeying uniform distribution.
The mass of the individual is obtained by fitness assessment, the greater the mass the better the position, the stronger the attraction, and the slower the movement. The inertial mass and gravitational of particle r are updated according to the following equation:
m r ( i ) = f i t r ( i ) w o r s t ( i ) b e s t ( i ) w o r s t ( i ) , ${m}_{r}(i)=\frac{fi{t}_{r}(i)-worst(i)}{best(i)-worst(i)},$ (13)
M r ( i ) = m r ( i ) s = 1 N m s ( i ) , ${M}_{r}(i)=\frac{{m}_{r}(i)}{{\sum }_{s=1}^{N}{m}_{s}(i)},$ (14)
where fitr(i) and Mr(i) respectively represent the value of the MAPE and mass of the rth particle in the ith iteration. The specific definitions of best(i) and worst(i) are as follows:
b e s t ( i ) = min s { 1 , 2 , , N } f i t s ( i ) , $best(i)=\underset{s\in \left\{1,2,\text{\ldots },N\right\}}{\mathrm{min}\,}fi{t}_{s}(i),$ (15)
w o r s t ( i ) = max s { 1 , 2 , , N } f i t s ( i ) , $worst(i)=\underset{s\in \left\{1,2,\text{\ldots },N\right\}}{\mathrm{max}\,}fi{t}_{s}(i),$ (16)
the best(i) and worst(i) represent the best fitness function value and the worst fitness function value among all particles at the ith iteration.

4 CASE STUDY

The data utilised in this section are from the data repository of the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Centre for Systems Science and Engineering (JHU CSSE).1 In the second subsection a brief description of the criteria for experimental evaluation is provided. Our experiments were conducted on a computing hardware environment with Intel Core i7 3.60 GHz and 8 GB RAM, running on Python 3.7.

4.1 Data description

The active cases for the case study were obtained from the data repository of the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Centre for Systems Science and Engineering (JHU CSSE), which includes daily case reports for COVID-19 worldwide, daily status reports for the United States, and time series summaries. The timestamped of all cases are denoted in UTC (GMT + 0). The number of active cases for each country and region are aggregated by each country and region from January 27, 2020 to December 21, 2020.

4.2 Evaluation criteria

This experiment adopts two common evaluation metrics to assess the performance of our model. These two criteria are the mean absolute percentage error (MAPE) and the root mean square error (RMSE), which are defined as follows.
R M S E = 1 S s = 1 S f ˆ s f s 2 , $RMSE=\sqrt{\frac{1}{S}\sum\limits _{s=1}^{S}{\left({\widehat{f}}_{s}-{f}_{s}\right)}^{2}},$ (17)
M A P E = 1 S s = 1 S | f ˆ s f s f s | , $MAPE=\frac{1}{S}\sum\limits _{s=1}^{S}\vert \frac{{\widehat{f}}_{s}-{f}_{s}}{{f}_{s}}\vert ,$ (18)
where S is the number of test samples, fs and f ˆ s ${\widehat{f}}_{s}$ are the true measured and predicted values of the sth test sample.

4.3 Experiment & results analysis

The projections for COVID-19 active cases are intended to anticipate fluctuations in the population over time, rather than to anticipate variations in the population on a per-day basis. Hence we calculated the number of active cases for each country and region by adding up the original data to obtain the total number of active cases for that day as one of our data samples. We selected each experimental sample to be sequenced by time series and intercepted the data using a rolling method with a rolling step set to 1. The data were counted every 5 days as one group, resulting in a total of 326 sets of data. The data sample for each set is divided into two parts, the first part consists of the data from the first 4 days and the second part consists of the data from the last day, which is the value to be predicted, that is, the target value. Hence, the ELM network has a 4-dimensional input vector and a 1-dimensional output vector. The overall data is also divided into two parts, where 80% of the data is utilised to train our model, while the residuals are used to test the performance of the model. In this study, two evaluation criteria, RMSE and MAPE, are applied to gauge the performance of GSA for ELM optimisation. In purpose to have a better training effectiveness of the ELM model, we set the number of neurons in the hidden layer within the network to 100, while the specific settings of the GSA parameters are shown in Table 1. In our experiments we also analysed the effect of different number of iterations on the prediction performance of the model by setting the number of iterations of the model to different values (1–100). We show in Figure 3 the results comparing MAPE for different number of iterations. We can observe that when the number of iterations exceeds 30, the particles in the GSA move to the optimal position and MAPE values converge to a smooth value. At this point, the performance of the model does not improve significantly after more iterations. Therefore, in this experiment we set the initial number of iterations of the model to 100 to ensure that the model can fully converge and that the model will not lose accuracy due to insufficient training.

TABLE 1. The GSA parameters setting.
Parameter Value
Gravitational constants initial value G0 100
Constant ϵ 20
Number of iterations I 100
Number of particles N 40
Constant σ 1
Details are in the caption following the image

MAPE of the prediction results for different number of iterations. After the number of iterations exceeds 30, the MAPE value tends towards stability.

We also compare the performance of our model with a model that utilises the particle swarm algorithm (PSO) to optimise the ELM to demonstrate the superiority of GSA in determining the ELM parameters. For the purpose of comparison between algorithms, all parameters should be considered as the same criteria, except for algorithm parameters that require special settings. The particular parameter settings in PSO follow previous work in our experiments [19, 52]. Specifically, we set the social learning factor c1 and the individual learning factor c2 in the algorithm to 0.55 and 0.35, respectively, and the inertia weight ω that regulates the search range over the solution space is set to the default 0.9. The number of particle swarms is 20 and the maximum number of iterations is likewise 100. Table 2 shows the prediction results of the ELM model with optimised ELM parameters using the GSA and PSO algorithms, respectively, versus the ELM model without any optimisation algorithm. We can see that optimising the parameters of the ELM by both the GSA and PSO algorithms can effectively improve the performance of the ELM. Compared with the ordinary ELM, GSA helps the ELM network solve the optimal input weight vector and bias value parameters that are most suitable for the prediction task by simulating the motion of particles. Not only can it effectively help ELM to get rid of local extremes and thus obtain optimal results, but also the optimised model can have certain generalisation ability. In particular, the GSA-ELM model proposed in this paper has obvious advantages over the PSO-ELM model in both metrics. The PSO algorithm is easy to fall into the local extrema for functions with multiple local extrema in the optimisation problem, thus obtaining suboptimal results. Meanwhile, the PSO method offers the possibility of global search, but does not strictly prove its convergence on the global optimal point. The number of active cases of COVID-19 shows a nonlinear variation with time, and the parameter optimisation problem of ELM has multiple local extremum points, so it does not perform better compared to the GSA algorithm.

TABLE 2. Comparison of forecasting performance and computational time overhead for different ELM-based models.
Method MAPE RMSE Training (s) Testing (s)
ELM  [18] 7.09 831957.76 0.033 1.9 × 10−3
PSO-ELM 5.81 498020.46 1039.87 2.9 × 10−3
GSA-ELM 3.64 397025.31 482.53 2.4 × 10−3
  • Note: Bold values denote the best one among that column.

We also report the time overhead of the three models for training and testing. In the training phase, the ELM models based on the optimisation algorithm all take more time. Compared to the ELM, it just randomly assigns parameters without the optimisation process and is relatively less accurate. Secondly, GSA has faster convergence speed and superior prediction effect than PSO, which is more practical in the actual online active cases prediction. And in the testing phase, the time overheads of the three models are roughly similar, the reason being that the inference of the final prediction results all depend mainly on the speed of ELM.

4.4 Performance evaluation

Active cases prediction was performed utilising several conventional machine learning models, where the datasets and data are divided in a similar way as before for the comparison experiments. These models were implemented by invoking the sklearn algorithm library, and the parameters of the models mostly used the default settings in sklearn. Table 3 exhibits the prediction performance of each model. The individual models are described below.

KNN [56]: Through finding the k nearest neighbors of a sample and assigning the average of some attribute(s) of these neighbors to that sample, the value of the corresponding attribute(s) of that sample can be obtained. The choice of k value in KNN algorithm will have a large impact on the prediction performance of the model. If a smaller value of k is chosen, it is equivalent to predicting with training instances in a smaller domain, which means that the overall model becomes complex and prone to overfitting; if a larger value of k is chosen, it is equivalent to predicting with training instances in a larger domain, which has the advantage of reducing the estimation error of learning, but the disadvantage that the approximation error of learning increases. Therefore, considering the above reasons, we experimentally set the value of k to 4 to make the KNN model have better prediction performance and robustness to noise.

DecisionTree (DT) [54]: The study adopts a Decision Tree model based on Classification and Regression Tree (CART) for prediction. CART does not require any a priori assumptions and is highly resistant to noise and missing data. Set where the max _length parameter is 5.

SVR [53]: The principle of SVR is to locate a regression plane in which all data of a collection have the closest distance to that plane. This experiment applies the best-fitted Gaussian RBF kernel function.

RidgeRegression [58]: A regularised version of linear regression, is a biased estimation regression method, which is essentially a modified least squares estimation method. For which the regularisation parameter α is set to 0.5.

ANN [57]: The artificial neural network can continuously learn to extract the features of each part of the data, and change the strength of each connection by training the network weights of the connections until the output of the top layer gets the correct answer. We set the number of hidden layers here to 1 and the number of nodes to 50.

KF [55]: Kalman Filtering (KF) provides optimal estimation of the system state through the system input and output observations. We set the variance of the process error Q to 0.1 × I, where I represents the identity matrix. We set the variance of the measurement noise to 0. The covariance matrix of the initial state estimation error is denoted as 10−2 × I.

Among the GSA algorithm and PSO algorithm, the particles are randomly distributed and the results of each experiment are different. Therefore, the experimental results of both the GSA-ELM model and the PSO-ELM model are run over 100 times. Multiple experiments were conducted for each model in the comparison experiments and the final results of each outcome were averaged to ensure the fairness of our experiments. The comparison of the prediction performance and time overhead between the different methods is shown in Table 3. We can observe that: 1) GSA-ELM has better performance in both metrics compared to other traditional machine learning methods. For example, the MAPE value decreases by 14.8% compared to the ridge regression model, which has the smallest MAPE value among the other five models. Since SVR solves support vectors with the help of quadratic programming, and solving quadratic programming will involve the computation of a matrix of order m (m is the number of samples), the storage and computation of this matrix will consume a lot of machine memory and computing time when the number of m is large. At the same time, the performance of regression mainly depends on the selection of kernel function, so the actual problem of COVID-19 active case prediction, how to choose the appropriate kernel function according to the actual data model to construct the SVR algorithm is still very challenging. The Kalman Filter does not achieve optimal estimation in the nonlinear scenario of the COVID-19 active case because it only provides accurate estimation of the linear process and measurement model. KNN prediction results are easily affected by noisy data, the number of active cases of COVID-19 is not stable from day to day, and the category of new samples biased towards the category with the dominant number in the training sample, which easily leads to prediction errors. It also has high computational complexity and memory consumption because for each text to be classified, the distance to all known samples has to be calculated to find its K nearest neighbors, and the computational time overhead is also long. The Ridge Regression method is essentially a modified least squares estimation method that requires a more realistic and reliable regression coefficient at the cost of losing some information and reducing accuracy by abandoning the unbiased nature of the least squares method. The ELM has good generalisation performance and remains highly robust to the number of COVID-19 active cases affected by various background noises. Meanwhile, the RMSE value decreased by 25.6% compared with the ELM model with the smallest RMSE, which proved the effectiveness of the GSA global optimisation of the parameters of the ELM. 2) GSA-ELM relies on the continuous shifting of the position of the particles of the population to find the global optimal solution of the problem. Therefore, it will take longer time in the model training phase compared to other methods with lower complexity. However, when comparing the time overhead of the testing phase, it is clear that ELM has the fastest speed. This is because in the training phase ELM derives the weights from the hidden layer to the output layer by inverse operations, while GSA determines the optimal input weight values and the bias values of the hidden neurons. In the test phase ELM only needs to perform a simple matrix multiplication operation to accomplish the task of active cases number prediction. The test phase has a similar time cost as other machine learning, which greatly satisfies the need for fast prediction of the number of activated cases in reality.

TABLE 3. The forecasting results of GSA-ELM and other contrast models.
Method MAPE RMSE Training (s) Testing (s)
SVR [53] 12.54 1156809.46 0.039 5.3 × 10−3
DT [54] 13.72 1242751.27 0.019 3.1 × 10−3
KF [55] 12.95 1204367.18 0.154 7.3 × 10−3
KNN [56] 10.47 845803.09 0.011 1.6 × 10−2
ANN [57] 10.76 813295.43 6.42 2.8 × 10−3
Ridge Regression [58] 8.94 705551.56 0.115 4.0 × 10−3
ELM [18] 10.19 595782.92 0.033 1.9 × 10−3
GSA-ELM 7.79 474522.09 482.53 2.4 × 10−3
  • Note: Bold values denote the best one among that column.

In addition, in order to better demonstrate the performance of our model and avoid the overlapping of the forecasting result lines under the large scale condition that makes multiple lines indistinguishable. For this reason, we selected several periods of time when the number of active people exploded for presentation. From Figure 4, we can notice that the forecasted value of GSA-ELM basically coincides with the real value, while the other six models have a little discrepancy with the real value, where the green line represents the real value and the red line represents the forecasted value of GSA-ELM. We can observe that the GSA-ELM model has high accuracy and stability in most circumstances, and its performance is better than other conventional machine learning forecasting methods. In summary, our model has more advanced performance and faster prediction response time compared to other methods for future dynamic global active cases forecasting.

Details are in the caption following the image

Six examples demonstrate our model outperforms the state-of-the-art models.

It is worth noting that the model proposed in this paper can be deployed online and the model training and active cases prediction tasks can be performed in parallel. First, we train the model with the help of existing data on the number of active cases and deploy it online for prediction work. In this case, the model takes little time to complete the prediction task compared to other methods. Second, the prediction capability is further enhanced by collecting the latest reported daily active cases data worldwide to continuously train and optimise the internal parameters of our model. In addition, the latest trained parameters are uploaded in parallel while running online. Finally, the ELM with updated parameters is used to make more accurate and faster predictions of the dynamic number of COVID-19 active cases in the future. The model can effectively help government agencies and related organisations to make appropriate epidemic prevention decisions quickly and further prevent the spread of the epidemic among the population.

4.5 Dataset division proportion

We split the overall data into two parts, that is, the training set and the test set for the evaluation of the model's parameter learning and predictive capability, respectively. We therefore analysed the impact of different splitting ratios on the prediction performance here. The detailed results are presented in Table 4. We can see that when the proportion of the training set is less than or equal to 80%, we obtain better results as the proportion of the training set increases. In particular, the RMSE and MAPE decrease by 10.9% and 6.6%, respectively, when the ratio of training set to test set reaches 70%:30%. This is due to the fact that the increase in the training set allows the model to be more adequately trained to learn more complex nonlinear patterns, which is more beneficial to improve the performance and robustness of the model in the face of data noise. We can also see that the metrics do not improve significantly when the proportion of the training set reaches 90%. A large proportion of the training set may result in training a model closer to the one trained with the total sample, increasing the possibility of data leakage. Also the model is prone to overfitting, so it appears to perform poorly on the emerging test set.

TABLE 4. The results of using different division proportions of the training and testing sets.
Metric Training set: Testing set
60% : 40% 70% : 30% 80% : 20% 90% : 10%
MAPE 8.62 8.05 7.79 7.87
RMSE 689373.51 614400.75 474522.09 503040.32
  • Note: Bold values denote the best one among that column.

4.6 Analysis of variance (ANOVA) test

To evidence whether there is a significant difference between our model and the other methods, we performed an ANOVA test. Table 5 shows the results. It is generally accepted that a p-value less than 0.05 means that the difference between the two models is statistically significant. We can see that all p-values in the table are less than 0.05, which reflects that our model is more statistically significant compared to other methods.

TABLE 5. Compare the p-values of the different methods in the ANOVA test.
Method SVR DT KF KNN RidgeRegression ELM
p-value 8.45 × 10−5 1.91 × 10−6 1.34 × 10−6 3.13 × 10−5 4.20 × 10−4 3.96 × 10−4

5 DISCUSSION

To solve the problem of suboptimal prediction due to random initial internal network parameters of the extreme learning machine. In this paper, a gravitational search algorithm is used to search for the optimal solution to the COVID-19 active case prediction task by simulating the gravitational motion of a swarm of particles. This effectively prevents ELM from falling into local optimality and improves its prediction ability in the face of nonlinear and unstable data. The experimental findings also show that our approach can improve the robustness of the model to a certain extent while keeping its running time overhead. The gravitational search algorithm has some advantages over traditional optimisation algorithms in terms of efficiency in solving nonlinear functions and in solving high-dimensional search space optimisation problems. It also has good search performance compared to other optimisation models. However, only the current position information plays a role in the iterative process, which indicates that the gravitational search algorithm is a method lacking in memory, and there is also the possibility of falling into a local optimum. This difficulty is a problem that optimisation search methods often encounter. This leads to the fact that using it together with ELM may produce predictions that are not very satisfactory. Therefore, in order to break the above limitations, we consider that other optimisation search models can be used to solve or other optimisation algorithms can also be used to optimise the initial parameter values of GSA in order to speed up the overall movement of the population and induce the algorithm to have a stronger search capability.

Currently, there are several studies in the literature that focus on optimising the network model to improve the immunity of the model to noise. For example, Cui et al [59] proposed a two-stage hybrid learning model to search the initial parameter values of the GSA in a data-driven manner with the PSO algorithm to improve the efficiency of the global optimum search. Yin et al [60] introduced a modified GSA with crossover (CROGSA), where the crossover-based search scheme utilises the promising knowledge extracted from the currently obtained global optimum positions to improve the exploitation capabilities. We have been working on collecting more COVID-19 related data for inclusion in the learning and training of the GSA-ELM model to improve the generalisation capability of the model for deployment in real applications. This will greatly assist the work of the outbreak prevention and control authorities and facilitate the implementation of targeted outbreak prevention and control policies.

6 CONCLUSION

In this paper, we propose a hybrid learning model GSA-ELM for COVID-19 global active cases forecasting. To predict the complex and multifactorial COVID-19 active cases, we use a gravitational search algorithm to search the global optimal parameters of the extreme learning machine. The experimental results demonstrate the reliability of the model in predicting real-life active cases compared to other state-of-the-art methods, and its excellent generalisation ability is of good application. In addition, our model can be deployed and applied online, the model uses a data-driven approach for training and the prediction of the number of active cases can be performed in parallel. In the future, the model will be able to assist the government and related organisations to make appropriate epidemic prevention plans as early as possible to control the spread of the epidemic among the population and protect people's lives.

AUTHOR CONTRIBUTION

Boyu Huang: Methodology; Software. Youyi Song: Formal analysis. Zhihan Cui: Validation. Haowen Dou: Data curation. Dazhi Jiang: Writing – review & editing. Teng Zhou: Funding acquisition; Project administration. Jing Qin: Supervision.

ACKNOWLEDGEMENT

This work was supported by the National Natural Science Foundation of China (No. 61902232), the 2022 Guangdong Basic and Applied Basic Research Foundation (No. 2022A1515011590), the Project of Strategic Importance of The Hong Kong Polytechnic University (No. 1-ZE2Q), and the 2020 Li Ka Shing Foundation Cross-Disciplinary Research Grant (No. 2020LKSFG05D). All authors of this paper would like to thank Mrs. Zhizhe Lin, Ph.D. candidate at Hainan University, for her help with this paper.

    CONFLICT OF INTEREST STATEMENT

    The authors declare that they have no conflict of interest.

    DATA AVAILABILITY STATEMENT

    The dataset and source code generated during and/or analysed during the current study are available from the corresponding author upon reasonable request.