Volume 13, Issue 6 p. 1023-1032
Research Article
Open Access

Hybrid dual Kalman filtering model for short-term traffic flow forecasting

Teng Zhou

Corresponding Author

Teng Zhou

Department of Computer Science, College of Engineering, Shantou University, Shantou, Guangdong, People's Republic of China

Center for Smart Health, School of Nursing, The Hong Kong Polytechnic University, Hong Kong

Search for more papers by this author
Dazhi Jiang

Dazhi Jiang

Department of Computer Science, College of Engineering, Shantou University, Shantou, Guangdong, People's Republic of China

Search for more papers by this author
Zhizhe Lin

Zhizhe Lin

Affiliated Shantou Hospital of Sun Yat-sen University, Shantou Central Hospital, Shantou, Guangdong, People's Republic of China

Search for more papers by this author
Guoqiang Han

Guoqiang Han

School of Computer Science and Engineering, South China University of Technology, People's Republic of China

Search for more papers by this author
Xuemiao Xu

Xuemiao Xu

School of Computer Science and Engineering, South China University of Technology, People's Republic of China

Guangdong Provincial Key Lab of Computational Intelligence and Cyberspace Information, Guangzhou, Guangdong, People's Republic of China

Search for more papers by this author
Jing Qin

Jing Qin

Center for Smart Health, School of Nursing, The Hong Kong Polytechnic University, Hong Kong

Search for more papers by this author
First published: 12 February 2019
Citations: 61

Abstract

Short-term traffic flow forecasting is a fundamental and challenging task since it is required for the successful deployment of intelligent transportation systems and the traffic flow is dramatically changing through time. This study presents a novel hybrid dual Kalman filter (H-KF2) for accurate and timely short-term traffic flow forecasting. To achieve this, the H-KF2 first models the propagation of the discrepancy between the predictions of the traditional Kalman filter and the random walk model. By estimating the a posteriori state of the prediction errors of both models, the calibrated discrepancy is exploited to compensate the preliminary predictions. The H-KF2 works with competitive time and space to traditional Kalman filter. Four real-world datasets and various experiments are employed to evaluate the authors’ model. The experimental results demonstrate the H-KF2 outperforms the state-of-the-art parametric and non-parametric models.

1 Introduction and literature review

Accurate traffic flow forecasting is essential for many intelligent transportation systems. Such perspective forecasts not only participate in the dynamic traffic assignment (DTA), but also provide routing information to the motoring public for their travel plans [1]. In addition, proactive forecasting can mitigate the adverse effects of traffic problems, and promote transportation network design.

The increasing availability of traffic flow data encourages researchers to develop more accurate forecasting methodologies working in real-time [2]. They can be grouped into two classes, parametric approaches and non-parametric ones [3]. Parametric approaches include historical average (HA) [4], Kalman filtering methods [5-7], and autoregressive integrated moving average (ARIMA) model [8-10], seasonal ARIMA [11-13], multivariate time-series models [14-16], spectral analysis [17, 18] and the structural time-series model [19, 20] and so on; whereas non-parametric ones include artificial neural network (ANN) [21, 22], non-parametric regression models [23], fuzzy logic system methods [24-26], support vector regression (SVR) [27-30] and so on. Parametric models capture all their information about the traffic status the within their parameters, while nonparametric ones take an unspecified number of parameters to store the more subtle aspects of the traffic data implicitly, including noise inevitably. Hence, non-parametric models take more time and computational effort to learn optimal parameters.

In order to process large-scale traffic flow data in real-time, several parametric methods, including the family of Kalman filters, have been integrated into intelligent transportation systems. A family of Kalman filters has been developed for short-term traffic flow forecasting. The extended Kalman filter (EKF) approximates the non-linear system with first-order linearisation [31]. The second-order EKF estimates the state by maintaining the second-order terms of the Taylor expansions of the state and measurement propagations [32]. The unscented Kalman filter approximates the dynamic system based on the unscented transform [33]. The cubature Kalman filter (CKF) approximates the Gaussian filter using a spherical–radial cubature rule [34]. In this paper, a novel hybrid dual Kalman filter is developed based on the discrepancies of the estimations of the Kalman filter and the random walk (RW) model.

The Kalman family filters are efficient and effective parametric techniques that have applied to real-world traffic flow data [5, 6, 35-37]. The Kalman filter works in an optimal recursive way with the comparative performance for linear dynamic systems [38-40]. van Hinsbergen et al. [36] proposed a localised EKF for real-time traffic state estimation, and the experiments showed their Kalman filter based models is fast enough for large-scale real-time applications. Wang et al. [7] proposed an adaptive Kalman filter based algorithm for short-term traffic flow forecasting, and the robust predictions with relatively small errors in a large-scale real-time environment demonstrate the outperformance of their proposal. Wang et al. [37] proposed an improved Kalman filter algorithm to balance the ratio of measurements to model results based on their variances, and real field data being evaluated indicated high real-time accuracy of their approach. However, the Kalman filter based methods are prone to overshooting in practice [7], which often occurs in the commuting hours, when the congestion happens. Some researchers introduced EKF to deal with this problem [36, 41, 42]. However, the Kalman gain is optimal only and if only transition and observation models are linear in their arguments, which does not meet in the extended versions. Thus, the Kalman gain is no longer optimal. A small bias of the initial state will lead to quickly divergence of the model, and the covariance matrix tends to be undetermined [43].

Another feasible way is to combine smoothing techniques [44-46], such as wavelet denoising to preprocess traffic flow data to improve the forecasting accuracy [5]. In these models, the current traffic flow can be represented as the weighted linear combination of recent traffic flow. The weight is the state of the dynamic linear system and the traffic flow is regarded as the observation. After the prior state is estimated, the traffic flow prediction is the weighted combination of the present and recent traffic flow. However, the smooth techniques cannot overcome the overshooting essentially. Fortunately, not all the traffic flow forecasting models are prone to overshooting, e.g. RW model. Once the Kalman filter tends to overshoot, the discrepancy between the prediction of the Kalman filter and the RW model will vary.

Motivated by the variation of the discrepancy, we proposed a novel model termed hybrid dual Kalman filter (H-KF2) for traffic flow forecasting. The low-level Kalman filtering model inherits from Xie et al. [5]. Then another dynamic system is introduced to model the propagation of the discrepancies, which are the discrepancy between the expectation of the traffic flow and the prediction of the Kalman filter, and the discrepancy between the traffic flow expectation and the RW model. The observation of this system is the discrepancy between the prediction of the Kalman filter and the RW model. This Kalman filter estimates the discrepancies of the preliminary predictions. Then the optimal discrepancy is exploited to compensate for the preliminary prediction. In addition, this model works in constant time complexity and limited memory, which is suitable for large-scale real-time traffic flow forecasting. In the end, sufficient experiments have demonstrated the superior performance of the proposal.

2 Methodology

As discussed above, when the Kalman filter tends to overshoot, the discrepancy between the prediction of the Kalman filter and the prediction of the RW model varies. Motivated by this observation, the evolution of traffic flow is modelled as a dynamic linear system. A Kalman filter, named low-level Kalman filter, is employed to estimate the prior state of the system. The prediction of the traffic flow is derived from the prior state. Then the discrepancy vector including the discrepancies between the traffic flow expectation and the prediction by the low-level Kalman filter, and the discrepancies between the expectation and the RW model is evolved as the state of another dynamic linear system. The high-level Kalman filter is introduced to estimate the a posteriori state of the later dynamic system. Then, the discrepancies are exploited to improve the final prediction. The basic idea of the proposed framework is illustrated in Fig. 1. In this figure, the nodes with different colors present different data types. And the intensity of the colour denotes the certainty. For example, we have the a posteriori state of the first dynamic system urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0002 at the time urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0004 , then we can estimate the prior state urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0006 using the low-level Kalman filter. Here, urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0008 denotes the estimation is made after the observation of the traffic flow urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0010 at the time urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0012 , while urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0014 denotes the estimation made before the observation at the time urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0016. The prediction of the low-level Kalman filter urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0018 can be deduced by urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0020. Similarly, urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0022 is a vector of two dimensions, the first element is the discrepancy between the prediction of the RW model urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0024 and the traffic flow expectation urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0026 , and the second is the discrepancy between the traffic flow expectation urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0028 and the prediction of low-level Kalman filter urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0030. The discrepancy between the low-level Kalman filter and the RW model is denoted as urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0032. Since urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0034 is observable in the second dynamic system, we can estimate the a posteriori urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0036 , which is used to improve the final prediction urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0038.

Details are in the caption following the image
Basic idea of H-KF2. The flowchart is expected to be read from left to right, and from top to bottom chronologically. In other words, the nodes on the left are always estimated or observed before the nodes on the right, and the nodes on the top are always estimated or observed before the nodes on the bottom. The lime green nodes of the first layer are the traffic flow observed. The light pink nodes of the second layer are the prior state estimations of the low-level Kalman filter. The salmon nodes of the third layer are the a posteriori state estimations of the low-level Kalman filter after the traffic flow observed. The spring green nodes of the fourth layer are the predictions of the low-level Kalman filter. The pale green nodes of the fifth layer are the predictions of the RW model. The violet red nodes of the sixth layer are the observations of discrepancies between the models. The light purple nodes of the seventh layer are the prior state estimations of the discrepancies. The purple nodes of the eighth layer are the a posteriori state estimations of the discrepancies. The lime nodes of the ninth layer are the final predictions of the H-KF2. The low-level Kalman filter takes place above the upper dash line. The RW model takes place between the upper dash line and the lower dash line. The high-level Kalman filter takes place below the lower dash line

2.1 Low-level Kalman filter

In the first dynamic linear system, the traffic flow is modelled as the weighted combination of the recent traffic flow. The weight is regarded as the state of the system. The current traffic flow is the observation of this system. The system is presented by the following equations:
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0040(1)
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0042(2)
Equation (1) is the state innovation of the system and (2) is the system observation. Here, urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0044 is the traffic flow from time urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0046 to urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0048. urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0050 is the state whose elements are the weight for the corresponding traffic flow. urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0052 and urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0054 are random variables for the process and observation noise, respectively. They are assumed to be independent Gaussian
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0056(3)
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0058(4)
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0060(5)
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0062(6)
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0064(7)
where urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0066 is the process error covariance, urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0068 is the observation error covariance and urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0070 is the Kronecker delta function.

The one-step prediction is derived from the prior estimation of the state urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0072. Lan [47], Xie et al. [5], Wang et al. [7] also made a similar assumption.

The low-level Kalman filter estimates the prior state of this dynamic linear system by recursively updating the a posteriori estimation urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0074 of the state and the covariance of the estimation error urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0076 by the following equations:
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0078(8)
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0080(9)
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0082(10)
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0084(11)
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0086(12)
where urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0088 is the identity matrix.
Yang et al. [48] recursively adjusted the weight off-line and found that the covariance matrix is important to the system. To estimate parameters urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0090 , we assume that the observation noise matrix urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0092 is time invariant for a short period. The initial state urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0094 has the mean urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0096 and covariance urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0098. The EM algorithm maximises the log-likelihood function (13) in the M-step
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0100(13)
The updated parameters are used for the expectation in the E-step, recursively. See Kashyap [49] for more details. Also, the wavelet denoise technique is introduced for traffic flow data preprocessing, see [5].

2.2 High-level Kalman filter

The Kalman filter is a good predictor for the near future, while the RW model is informative for the most recent state of the traffic flow [50]. The Kalman filtering model is prone to overshooting, while the RW model is not. However, the discrepancy between the prediction of the Kalman filter and the ground truth, as well as the discrepancy between the prediction of the RW model and the ground truth are difficult to be accurately estimated. Fortunately, the discrepancy between the prediction of the Kalman filter and the RW model is observable. We can make a co-estimation of both discrepancies by modelling them as a dynamic system. After the informative observation on the discrepancy of these two predictors, the estimation can be further improved according to the predictions of both models.

In order to model the propagation of the discrepancies, two definitions are given in (14) and (15), where urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0102 is the expectation of the traffic flow at the time t. Obviously, urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0104 is Gaussian with mean urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0106 and variance urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0108 , and urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0110 is also Gaussian with mean 0 and variance urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0112.
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0114(14)
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0116(15)
In (14) and (15), the sequences of the predictions and the expectation are inverse. If we add them together, the unknown urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0118 is counteracted. Especially, the sum is also Gaussian, which will be proven in Section 5.
Readers should notice that the discrepancy is Gaussian every individual moment. The evolution of this discrepancy can be modeled by the propagation matrix urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0120. As analysed, the observation is the discrepancy between the predictions of these two models. Then, the second dynamic system can be modelled as (16) and (17) given urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0122.
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0124(16)
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0126(17)
where urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0128 , urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0130 and urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0132.
Then, the high-level Kalman filter is given to estimate the a posteriori state of this dynamic system. The optimal estimation of the state and covariance by this Kalman filter will be proven in (6).
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0134(18)
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0136(19)
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0138(20)
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0140(21)
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0142(22)
where urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0144 is the state propagation matrix, urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0146 is the Kalman gain of this Kalman filter, urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0148 is the covariance of the estimation error, urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0150 is the covariance of process noise and urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0152 is the covariance of observation noise. The notation ‘ + ’ means the estimation is made after the observation, while ‘−’ means before the observation. Initially, urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0154. If we are quite confident about the estimation urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0156 , then urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0158 is relatively small, otherwise large. Refer to (16), urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0160. Since urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0162 is the identity matrix, urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0164. Then urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0166 can be subsequently estimated according to the historical predictions. The covariance of observation noise urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0168 is set to 0.

From the definitions in (14) and (15), once we have the a posteriori estimation of urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0170 , the final prediction of the traffic flow at a time urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0172 is easily deduced as urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0174 , where urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0176. The mathematical explanation of this recursive model can find in Section 6.

2.3 Summarise

The algorithm is summarised in Algorithm 1 (see Fig. 2). As we see, this method works in constant time with limited memory. Thus, it is suitable for large-scale real-time traffic flow forecasting.

Details are in the caption following the image
Algorithm 1: H-KF2

3 Case study

In this section, the traffic flow data from four motorways A1, A2, A4 and A8 ending on the ring road (A10 motorway) of Amsterdam are used for the empirical study.

3.1 Data description

The real world data collected from four motorways, namely A1, A2, A4, and A8, which end on the ring road of Amsterdam.
  • The A1 motorway connects the city of Amsterdam with the German border, which is also a European route. The European route E30 follows the A1 motorway from the interchange Hoevelaken in the Netherlands. There is the first High-occupancy vehicle (HOV) 3+ barrier-separated lane in Europe on A1 motorway.
  • The A2 motorway is one of the busiest highways in the Netherlands, which connects the city of Amsterdam and the Belgian border.
  • The A4 motorway is part of the Rijksweg 4, which starts from Amsterdam to the Belgian border. The A4 motorway has priority from the eastern direction until the interchange De Nieuwe Meer, then travels to the southeast.
  • The A8 motorway starts from the A10 motorway at interchange Coenplein, ends at Zaandijk <10 km.

The four measurement sites locate on the inflow of motorways a short distance before the merge points to the ring road. Fig. 3 shows the road network in Amsterdam [3]. The data were provided from 20 May 2010, until 24 June 2010, 1-min aggregated, which collected by MONICA sensor.

Details are in the caption following the image
Four motorways namely A1, A2, A4, and A8, which end on the ring road of Amsterdam

The raw data mix with incorrect measurements, which are zeros or negative values for a long period. The incorrect data were corrected by averaging measurements of the same moments other weeks.

3.2 Control models description

Six frequently used forecasting models are involved in the evaluation, and these models are often integrated into intelligent transportation systems. The hybrid particle swarm optimisation SVR model is detailed in [51], which uses the first four weeks data for training and the rest for evaluating. The HA model [30] and the Kalman filtering model [5] (the low-level Kalman filter) are detailed before. The RW model [30] simply regards the last measurement as the prediction. The HA and RW are often used as the baseline for new models. The autoregression (AR) model is trained with data from the first four weeks, and the fifth is for evaluating. The ANN is detailed in Zhu et al. [52].

Support vector machine regression: For the support vector machine regression model, several parameters need to be set beforehand. The regression horizon is set the same as the AR model. We use radial basis function (RBF) as the kernel type in this study. The cost parameter C is set to the maximum difference between the traffic flow. The width parameter urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0178 and the urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0180 -insensitive are determined by particle swarm optimisation. The width parameter urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0182 for the RBF kernel is urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0184 and the urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0186 -insensitive loss for the SVR is 1 in this study. The settings of these six models are described below.

HA: This model predicts for a given the time of day the average of the same time on the same day in previous weeks.

RW: This model simply predicts the traffic flow next moment as equal to the current condition.

Kalman filtering: A wavelet denoising procedure proposed by Xie et al. [5] is employed to preprocess the traffic flow data. We use Daubechies 4 as the mother wavelet as suggesting in [5]. The variance of the measurement noise is 0, since we regard the measurement is correct. The initial state is set to urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0188 , where n is set to 8, the same as Xie et al. [5]. The covariance matrix of initial state estimation error is urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0190.

AR: The AR model is a representation of a random process and it has been widely used in traffic flow forecasting due to the randomness of the traffic flow. In the AR model with the order p, the current traffic flow is represented by a weighted combination going back p periods, following a random disturbance in the current period. In this regard, the order p is critical for the AR models. On the other hand, if the order is too high, more coefficients need to be estimated, and additional errors will consequently be introduced. The order in our experiment is set to 8 by cross-validate of our training data.

ANN: We employ the ANNs introduced in Zhu et al. [52]. The number of hidden layers is set to 1. The optimal goal is 0.001. The spread is 2000. The number of the nodes in the hidden layer is 40. Most of the network parameters are consistent with [52].

3.3 Evaluation criteria

Some frequently used criteria are employed to evaluate the performance of the proposed approach. The root mean square error (RMSE) measures the average differences between the predictions of a model and measurements of the system being modelled. The mean absolute percentage error (MAPE) is the percentage expression of the differences. The absolute percent error (APE) is the percentage expression of error every moment
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0192(23)
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0194(24)
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0196(25)
where urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0198 is the groundtruth, and urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0200 is the prediction at the time k.

3.4 Experiments setup

The collected data are divided into two parts, the first four weeks are used for training and the rest is used for evaluating. The process noise covariance urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0202 of the low-level Kalman filter is evaluated by the EM algorithm, which is detailed in Section 2.1 using the first four weeks data. The measurement noise covariance urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0204 of the low-level Kalman filter is set as 0. The process matrix urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0206 of the first level Kalman filter is set as an identity matrix urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0208. The initial covariance of the estimation error urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0210 is set as urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0212. The initial system state is urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0214.

The measurements are aggregated into 1, 5, and 10 min. The first-week measurements of A1 are displayed in Fig. 4. In Fig. 4, there are too many fluctuations in 1 and 5 min aggregations. The traffic flow data aggregated in high-resolution tend to be noisy, which will decrease the accuracy of the forecasting models. On the other hand, traffic flow data of too low-resolution provide little information for forecasting [1]. Another study reported that the resolution of the data should be equal to that of the data to be predicted [53]. Since we aim at the short-term traffic flow forecasting 10 min ahead and this work is not intended to forecast the fluctuations minute-wise, the 10-min average aggregations are used for evaluations.

Details are in the caption following the image
First-week measurements of A1 aggregated into 1, 5, and 10 min. There are too many fluctuations in the 1-min and 5-min resolutions. The aggregations in 10-min are relatively smooth
For the high-level Kalman filter, the process matrix urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0216 is set as an identity matrix, since we assume the predictions of both models are as precise as they made the previous interval, and the discrepancies are caused by the process noise. The covariance matrix of the process noise urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0218 needs to be estimated according to the historical predictions of the low-level Kalman filter and the RW model. We calculate the prediction errors of both models of the first four weeks’ data. Their corresponding covariance matrices of the process noise of the first four weeks of A1, A2, A4 and, A8 are
urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0220
respectively. We can see in the next subsection that the diagonal entries relate to the performance of the predictors, and the off-diagonal entries indicate the correlation of the prediction errors of both models. Furthermore, we present the analysis of variance (ANOVA) on the prediction errors of these two models for the first four weeks in order to show the different characteristic of these two models to support our assumptions. The ANOVA table is listed in Table 1, and the corresponding box graphs are shown in Fig. 5. From Table 1, the p is much smaller than the common significant level 0.05, so we can reject the null hypothesis. We can conclude that these two models have a statistical difference between two independent means. Thus, it is reasonable to use the predictions of two models as the preliminary prediction to boost. In Fig. 5, we can see that the means of the prediction errors of these two models are very close to 0, which supports the assumption we use in Section 6.
Table 1. One way of ANOVA of the Kalman filter and RW model
ANOVA Source SS df MS F p
A1 columns 1,685,318.36 1 1,685,318.36 20.34 0.000007
error 666,805,383.27 8046 82,874.15
total 668,490,701.63 8047
A2 columns 1,750,149.09 1 1,750,149.09 31.32 0.00000002
error 449,558,440.01 8046 55,873.53
total 451,308,589.10 8047
A4 columns 1,470,660.99 1 1,470,660.99 20.24 0.000007
error 584,733,671.62 8046 72,673.83
total 586,204,332.62 8047
A8 columns 375,921.70 1 375,921.70 11.99 0.0005
error 252,293,036.27 8046 31,356.33
total 252,668,957.97 8047
Details are in the caption following the image
Box graphs of the prediction error of low-level Kalman filter and RW model for the first four weeks. The vertical axis is the prediction error of the RW model and the low-level Kalman filter of the first four weeks of A1, A2, A4 and A8, which are shown in Figs. 5 a–d, respectively

The covariance matrix of the observation noise is set as 0. The covariance of the estimated error of the high-level Kalman filter is set as 0, initially.

3.5 Scenarios of study

To demonstrate the performance of the proposed H-KF2, we introduce several typical and difficult scenarios that often happen in the road networks. Three scenarios are selected from the first day of last week on A1 motorway. It is more convincing to select the data from the same point in a single day to demonstrate that the forecasting results are not occasional.

3.5.1 Scenarios #1 – rush hour

The first scenario is the rush hours when the traffic flow increases rapidly, as shown in Fig. 6a. The traffic flow is growing from about 1300 vehicles per hour to over 2800 vehicles per hour. The low-level Kalman filter is prone to overshooting when the traffic flow increases quickly or drops suddenly. The RW model is hardly catching these rapid fluctuations, as well as most of the other forecasting models. In this scenario, our model is acted as a trade-off between both models. The H-KF2 prevents the overshooting of the Kalman filter, and prevents the falling-behind of the RW model. The corresponding APEs of the H-KF2, Kalman filter and RW model in this scenario are shown in Fig. 6b. As we see, the H-KF2 performs better than the Kalman filter, and better than the RW model alternately at times. The total performance evaluated by the RMSE and MAPE is listed in Table 2. The RMSE of H-KF2 is improved by 2.34% than the KF, and is improved by 4.03% than the RW. The total performance of the H-KF2 outperforms the rest two in such a difficult scenario. Such improvement is important in intelligent transportation systems since it alleviates the variance of the accuracy of the predictor, which is a critical issue in traffic management and traffic information publishing. Moreover, this model encumbers little time and space complexity to the existing systems.

Table 2. Performance of models in scenario #1 – rush hour
H-KF2 KF RW
RMSE 198.78 203.44 206.80
MAPE 7.74 8.01 8.00
Details are in the caption following the image
Scenario #1 – rush hour. The traffic flow evaluates rapidly in Fig. 6 a. Although these three models are hardly to predict the traffic flow very accurate, the H-KF2 plays as a trade-off between the Kalman filter and RW model. Fig. 6 b is the APEs of these three models in this scenario

3.5.2 Scenario #2 – late night

The second scenario is late at night when the traffic flow is low. The traffic flow ranges from <700 vehicles per hour to about 350 vehicles per hour in this scenario. In this scenario, a large error occurs at the fifth interval. At this interval, the low-level Kalman filter is overshooting, while the RW model cannot catch up the sudden change of the traffic flow. The high-level Kalman filter accurately forecasts the traffic flow at this interval. The high-level Kalman filter can catch the discrepancy of predictions by the low-level Kalman filter and RW model. The final prediction is compensated by a posteriori estimation of the discrepancy of the predictions by both models. In this scenario, the RMSE of H-KF2 is improved by 38.87% than the KF and is improved by 39.69% than the RW. The MAPE of H-KF2 is improved by 3.32% than the KF, and is improved by 3.97% than the RW. The H-KF2 helps to improve the forecasting accuracy effectively when the traffic flow is low, which is still a difficult issue in this area [3] (Fig. 7) (Table 3).

Table 3. Performance of models in scenario #2 – late night
H-KF2 KF RW
RMSE 65.83 91.42 91.96
MAPE 11.75 15.07 15.72
Details are in the caption following the image
Scenario #2 – late night. The traffic flow is low in Fig. 7 a. The Kalman filter and the RW model fail to make very accurate forecasting, while the proposed H-KF2 succeeds. Fig. 7 b is the APEs of these three models in this scenario

3.5.3 Scenario #3 – afternoon commuting hour

The third scenario is the shift of the commuting time in the afternoon. The traffic flow is growing from under 1500 vehicles per hour to more than 2800 vehicles per hour in one hour. Although the increasing rate of the traffic flow keeps changing every moment, which makes it hard to predict, the H-KF2 is successful to manipulate these changes and provide more accurate forecasting results. This is because the H-KF2 makes the final prediction based on the prior predictions of both models, and estimates a more reasonable prediction according to the discrepancy of both primary models. The RMSE of H-KF2 is improved by 7.27% than the KF, and the MAPE of H-KF2 is improved by 2.02% than the KF (Fig. 8) (Table 4).

Table 4. Performance of models in scenario #3 – afternoon commuting hour
H-KF2 KF RW
RMSE 221.53 237.63 226.22
MAPE 6.67 8.69 7.46
Details are in the caption following the image
Scenario #3 – afternoon commuting hour. The traffic flow shifts to commuting time in Fig. 8 a. Although the increasing rate of the traffic flow keeps changing every moment, the H-KF2 makes more accurate forecasting than the Kalman filter and RW model. Fig. 8 b is the APEs of these three models in this scenario

3.6 Performance evaluation

In order to demonstrate the superior performance of the H-KF2, we compare with six models including the state-of-the-art ones in Table 5. The six comparison models include parametric and non-parametric ones that are often integrated into intelligent transportation systems. The fifth-week data from A1, A2, A4, and A8 are used for evaluation. Although the RMSE is better for showing the bigger deviations, the MAPE has the easiest interpretation and is useful to compare the precision between different volumes under study [54]. Furthermore, the MAPE is more suitable for the comparison of long time series. Since we would like to compare the performance of the models for a whole week, we use the MAPE as the evaluation criterion.

Table 5. MAPE of the H-KF2 and other models
A1, % A2, % A4, % A8, %
SVR 14.3 12.2 12.2 12.5
HA 16.9 15.5 16.7 16.2
RW 12.6 11.4 12.1 12.4
AR 13.6 11.6 12.7 12.7
ANN 12.6 10.9 12.5 12.5
KF 12.5 10.7 12.6 12.6
H-KF2 11.3 10.0 11.3 11.6

In Table 5, the H-KF2 outperforms the typical parametric and non-parametric methods. The H-KF2 is easy-to-implement and fast computation to achieve competitive accurate and robust traffic flow forecasting in practice for the large-scale real-time environment. The H-KF2 outperforms the RW model and the Kalman filter in these four datasets, since the predictions of the H-KF2 are rectified in the high-level Kalman filter according to the discrepancy of these two models. The HA is often used as a baseline for a new model. The H-KF2 is much better above the baseline. Meanwhile, compared with the non-parametric models, the H-KF2 has better performance than the non-parametric models with much less time and space. Moreover, the H-KF2 can work well with very small training data set. On the other hand, the performances of the support vector machine regression, the ANN, and the AR model are highly depended on the quantity and quality of the dataset [7]. However, the H-KF2 can work with limited data, and dynamically updated with new coming data. It is worth pointing out that the H-KF2 can be integrated with other predictors, and can be expanded to more layers.

In Table 6, all the models are implemented using Matlab™ 2017b on a workstation equipped with Intel™ Core i7 3.6 GHz, 16 GB RAM. The first four weeks’ data is used for training and the fifth week’ for is for testing. The HA and the RW models are fastest without high accuracy. The training time of the support vector machine regression and the ANN are increasing with the quantitative of the training data, and the prediction accuracy is highly depended on the quality of the training data. The AR model is fast, but the need manual decision of the order of the model, which requires expert domain knowledge and time to adjusting the model. The computation time of the H-KF2 and direct Kalman filter is similar, but the H-KF2 outperforms accurately. Furthermore, the computation time of the H-KF2 and the Kalman filter will approximate the same regardless of the increasing of training data.

Table 6. Computation time of the H-KF2 and other models
A1, s A2, s A4, s A8, s
SVR 2.187 2.127 2.137 2.130
HA 0.000 0.000 0.000 0.000
RW 0.000 0.000 0.000 0.000
AR 0.001 0.001 0.001 0.001
ANN 3.003 3.053 3.027 3.134
KF 0.838 0.833 0.828 0.833
H-KF2 0.868 0.845 0.840 0.848

In Table 7, the EKF is involved in a comparison. The EKF performs worth than the direct Kalman filter on the A1 and A8, while performs slightly better than the direct one on the A2. On the A4, the EKF outperforms the direct Kalman filter. It is because the Kalman gain is optimal only and if only transition and observation models are linear in their arguments, which does not meet in the extended versions. The H-KF2 outperforms the direct Kalman filter and the EKF on the A1, A2, A4, and A8.

Table 7. MAPE of the H-KF2 and the other Kalman family models
A1, % A2, % A4, % A8, %
EKF 13.1 10.6 11.9 12.8
KF 12.5 10.7 12.6 12.6
H-KF2 11.3 10.0 11.3 11.6

4 Conclusion

This paper presents a novel dual Kalman filter for short-term traffic flow forecasting. Our key idea is to leverage the discrepancy between the predictions of the traditional Kalman filter and the RW model. In our mechanism, the Kalman filter can dynamically estimate the a posteriori state of the dynamic system modelled by the prediction errors. In this way, we can produce calibrated compensation to the preliminary predictions. In the end, we test our model on four real-world datasets, compare it with various state-of-the-art models, and show the superiority of our model over the direct Kalman filter, the EKF, and other models.

In the future, we plan to explore the potential of our model for other applications, extend this framework to more layers, and integrate other models.

5 Acknowledgment

This work is supported by Natural Science Foundation of Guangdong Province (Grant No. 2018A030313291), STU Scientific Research Foundation for Talents (NTF18006), Science and Technology Planning Project of Guangdong Province (Grant No. 2016B010124012), and partly supported by Natural Science Foundation of China (Grant No. 61772206, U1611461, 61472145), Special Fund of Science and Technology Research and Development on Application From Guangdong Province (Grant No. 2016B010124011), and Guangdong High-level personnel of special support program (Grant No. 2016TQ03X319).

    7 Appendix

    7. 1 Appendix 1

    From the definitions, we can prove that urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0222 in (26) is also Gaussian
    urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0224(26)

    Proof.

    urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0227(27)
    The characteristic function of urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0229 is
    urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0231(28)
    Consider the traffic flow forecasting problem, urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0233 and urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0235 are a numerical value, which can be regarded as urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0237 symmetric matrices, and the square is always non-negative. Since urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0239 and urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0241 are symmetric, non-negative-definite matrix, urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0243 is also symmetric, non-negative-definite matrix. Therefore, urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0245 is Gaussian. □

    7.2 Appendix 2

    To investigate the mean of the state urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0247 propagation with time, we take the expected value of both sides of (16)
    urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0249(29)
    Then we use (16) and (29) to derive the covariance of urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0251.
    urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0253(30)
    In (30), urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0255 is the bias of the traffic flow predictions at the time urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0257 , while urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0259 is propagation error of the bias at the time t, actually, they are uncorrelated. Since the urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0261 is uncorrelated with urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0263 , then
    urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0265(31)
    Since the propagation in (29) and (31) from time t to urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0267 are before the observation, we denote this propagation as prior estimation with an additional ‘−’. After the observation, we denote the result as a posteriori estimation with the notation ‘ + ’. Thus, (29) and (31) can be rewritten as (18) and (19), respectively.
    In order to update urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0269 on the previous estimation urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0271 after urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0273 observed, a gain term is employed, then we will derive this gain
    urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0275(32)
    The mean estimation error is derived in a recursive way in the following equation:
    urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0277(33)
    Hence, we can derive the covariance of the estimation error after urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0279 observed
    urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0281(34)
    Since urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0283 is independent of urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0285 , urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0287 , (34) can be simplified as the following equation
    urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0289(35)
    With the propagation of the covariance, a criterion is introduced to obtain the optimal gain urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0291. As we know, the diagonal entries in urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0293 are the variance of estimation error. Thus, we employ a cost function in (36). Here, urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0295 is the trace of a square matrix.
    urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0297(36)
    Then we take the partial derivative of J to minimise the cost.
    urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0299(37)
    Equation (37) is set to urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0301 , and urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0303 can be derived as
    urn:x-wiley:1751956X:media:itr2bf00695:itr2bf00695-math-0305(38)
    In summary, the propagations of the state estimation and the covariance are given in the following recursive equations (18)–(22).