Hybrid dual Kalman filtering model for short-term traffic flow forecasting
Abstract
Short-term traffic flow forecasting is a fundamental and challenging task since it is required for the successful deployment of intelligent transportation systems and the traffic flow is dramatically changing through time. This study presents a novel hybrid dual Kalman filter (H-KF2) for accurate and timely short-term traffic flow forecasting. To achieve this, the H-KF2 first models the propagation of the discrepancy between the predictions of the traditional Kalman filter and the random walk model. By estimating the a posteriori state of the prediction errors of both models, the calibrated discrepancy is exploited to compensate the preliminary predictions. The H-KF2 works with competitive time and space to traditional Kalman filter. Four real-world datasets and various experiments are employed to evaluate the authors’ model. The experimental results demonstrate the H-KF2 outperforms the state-of-the-art parametric and non-parametric models.
1 Introduction and literature review
Accurate traffic flow forecasting is essential for many intelligent transportation systems. Such perspective forecasts not only participate in the dynamic traffic assignment (DTA), but also provide routing information to the motoring public for their travel plans [1]. In addition, proactive forecasting can mitigate the adverse effects of traffic problems, and promote transportation network design.
The increasing availability of traffic flow data encourages researchers to develop more accurate forecasting methodologies working in real-time [2]. They can be grouped into two classes, parametric approaches and non-parametric ones [3]. Parametric approaches include historical average (HA) [4], Kalman filtering methods [5-7], and autoregressive integrated moving average (ARIMA) model [8-10], seasonal ARIMA [11-13], multivariate time-series models [14-16], spectral analysis [17, 18] and the structural time-series model [19, 20] and so on; whereas non-parametric ones include artificial neural network (ANN) [21, 22], non-parametric regression models [23], fuzzy logic system methods [24-26], support vector regression (SVR) [27-30] and so on. Parametric models capture all their information about the traffic status the within their parameters, while nonparametric ones take an unspecified number of parameters to store the more subtle aspects of the traffic data implicitly, including noise inevitably. Hence, non-parametric models take more time and computational effort to learn optimal parameters.
In order to process large-scale traffic flow data in real-time, several parametric methods, including the family of Kalman filters, have been integrated into intelligent transportation systems. A family of Kalman filters has been developed for short-term traffic flow forecasting. The extended Kalman filter (EKF) approximates the non-linear system with first-order linearisation [31]. The second-order EKF estimates the state by maintaining the second-order terms of the Taylor expansions of the state and measurement propagations [32]. The unscented Kalman filter approximates the dynamic system based on the unscented transform [33]. The cubature Kalman filter (CKF) approximates the Gaussian filter using a spherical–radial cubature rule [34]. In this paper, a novel hybrid dual Kalman filter is developed based on the discrepancies of the estimations of the Kalman filter and the random walk (RW) model.
The Kalman family filters are efficient and effective parametric techniques that have applied to real-world traffic flow data [5, 6, 35-37]. The Kalman filter works in an optimal recursive way with the comparative performance for linear dynamic systems [38-40]. van Hinsbergen et al. [36] proposed a localised EKF for real-time traffic state estimation, and the experiments showed their Kalman filter based models is fast enough for large-scale real-time applications. Wang et al. [7] proposed an adaptive Kalman filter based algorithm for short-term traffic flow forecasting, and the robust predictions with relatively small errors in a large-scale real-time environment demonstrate the outperformance of their proposal. Wang et al. [37] proposed an improved Kalman filter algorithm to balance the ratio of measurements to model results based on their variances, and real field data being evaluated indicated high real-time accuracy of their approach. However, the Kalman filter based methods are prone to overshooting in practice [7], which often occurs in the commuting hours, when the congestion happens. Some researchers introduced EKF to deal with this problem [36, 41, 42]. However, the Kalman gain is optimal only and if only transition and observation models are linear in their arguments, which does not meet in the extended versions. Thus, the Kalman gain is no longer optimal. A small bias of the initial state will lead to quickly divergence of the model, and the covariance matrix tends to be undetermined [43].
Another feasible way is to combine smoothing techniques [44-46], such as wavelet denoising to preprocess traffic flow data to improve the forecasting accuracy [5]. In these models, the current traffic flow can be represented as the weighted linear combination of recent traffic flow. The weight is the state of the dynamic linear system and the traffic flow is regarded as the observation. After the prior state is estimated, the traffic flow prediction is the weighted combination of the present and recent traffic flow. However, the smooth techniques cannot overcome the overshooting essentially. Fortunately, not all the traffic flow forecasting models are prone to overshooting, e.g. RW model. Once the Kalman filter tends to overshoot, the discrepancy between the prediction of the Kalman filter and the RW model will vary.
Motivated by the variation of the discrepancy, we proposed a novel model termed hybrid dual Kalman filter (H-KF2) for traffic flow forecasting. The low-level Kalman filtering model inherits from Xie et al. [5]. Then another dynamic system is introduced to model the propagation of the discrepancies, which are the discrepancy between the expectation of the traffic flow and the prediction of the Kalman filter, and the discrepancy between the traffic flow expectation and the RW model. The observation of this system is the discrepancy between the prediction of the Kalman filter and the RW model. This Kalman filter estimates the discrepancies of the preliminary predictions. Then the optimal discrepancy is exploited to compensate for the preliminary prediction. In addition, this model works in constant time complexity and limited memory, which is suitable for large-scale real-time traffic flow forecasting. In the end, sufficient experiments have demonstrated the superior performance of the proposal.
2 Methodology
As discussed above, when the Kalman filter tends to overshoot, the discrepancy between the prediction of the Kalman filter and the prediction of the RW model varies. Motivated by this observation, the evolution of traffic flow is modelled as a dynamic linear system. A Kalman filter, named low-level Kalman filter, is employed to estimate the prior state of the system. The prediction of the traffic flow is derived from the prior state. Then the discrepancy vector including the discrepancies between the traffic flow expectation and the prediction by the low-level Kalman filter, and the discrepancies between the expectation and the RW model is evolved as the state of another dynamic linear system. The high-level Kalman filter is introduced to estimate the a posteriori state of the later dynamic system. Then, the discrepancies are exploited to improve the final prediction. The basic idea of the proposed framework is illustrated in Fig. 1. In this figure, the nodes with different colors present different data types. And the intensity of the colour denotes the certainty. For example, we have the a posteriori state of the first dynamic system at the time
, then we can estimate the prior state
using the low-level Kalman filter. Here,
denotes the estimation is made after the observation of the traffic flow
at the time
, while
denotes the estimation made before the observation at the time
. The prediction of the low-level Kalman filter
can be deduced by
. Similarly,
is a vector of two dimensions, the first element is the discrepancy between the prediction of the RW model
and the traffic flow expectation
, and the second is the discrepancy between the traffic flow expectation
and the prediction of low-level Kalman filter
. The discrepancy between the low-level Kalman filter and the RW model is denoted as
. Since
is observable in the second dynamic system, we can estimate the a posteriori
, which is used to improve the final prediction
.

2.1 Low-level Kalman filter
















The one-step prediction is derived from the prior estimation of the state . Lan [47], Xie et al. [5], Wang et al. [7] also made a similar assumption.














2.2 High-level Kalman filter
The Kalman filter is a good predictor for the near future, while the RW model is informative for the most recent state of the traffic flow [50]. The Kalman filtering model is prone to overshooting, while the RW model is not. However, the discrepancy between the prediction of the Kalman filter and the ground truth, as well as the discrepancy between the prediction of the RW model and the ground truth are difficult to be accurately estimated. Fortunately, the discrepancy between the prediction of the Kalman filter and the RW model is observable. We can make a co-estimation of both discrepancies by modelling them as a dynamic system. After the informative observation on the discrepancy of these two predictors, the estimation can be further improved according to the predictions of both models.


































From the definitions in (14) and (15), once we have the a posteriori estimation of , the final prediction of the traffic flow at a time
is easily deduced as
, where
. The mathematical explanation of this recursive model can find in Section 6.
2.3 Summarise
The algorithm is summarised in Algorithm 1 (see Fig. 2). As we see, this method works in constant time with limited memory. Thus, it is suitable for large-scale real-time traffic flow forecasting.

3 Case study
In this section, the traffic flow data from four motorways A1, A2, A4 and A8 ending on the ring road (A10 motorway) of Amsterdam are used for the empirical study.
3.1 Data description
- The A1 motorway connects the city of Amsterdam with the German border, which is also a European route. The European route E30 follows the A1 motorway from the interchange Hoevelaken in the Netherlands. There is the first High-occupancy vehicle (HOV) 3+ barrier-separated lane in Europe on A1 motorway.
- The A2 motorway is one of the busiest highways in the Netherlands, which connects the city of Amsterdam and the Belgian border.
- The A4 motorway is part of the Rijksweg 4, which starts from Amsterdam to the Belgian border. The A4 motorway has priority from the eastern direction until the interchange De Nieuwe Meer, then travels to the southeast.
- The A8 motorway starts from the A10 motorway at interchange Coenplein, ends at Zaandijk <10 km.
The four measurement sites locate on the inflow of motorways a short distance before the merge points to the ring road. Fig. 3 shows the road network in Amsterdam [3]. The data were provided from 20 May 2010, until 24 June 2010, 1-min aggregated, which collected by MONICA sensor.

The raw data mix with incorrect measurements, which are zeros or negative values for a long period. The incorrect data were corrected by averaging measurements of the same moments other weeks.
3.2 Control models description
Six frequently used forecasting models are involved in the evaluation, and these models are often integrated into intelligent transportation systems. The hybrid particle swarm optimisation SVR model is detailed in [51], which uses the first four weeks data for training and the rest for evaluating. The HA model [30] and the Kalman filtering model [5] (the low-level Kalman filter) are detailed before. The RW model [30] simply regards the last measurement as the prediction. The HA and RW are often used as the baseline for new models. The autoregression (AR) model is trained with data from the first four weeks, and the fifth is for evaluating. The ANN is detailed in Zhu et al. [52].
Support vector machine regression: For the support vector machine regression model, several parameters need to be set beforehand. The regression horizon is set the same as the AR model. We use radial basis function (RBF) as the kernel type in this study. The cost parameter C is set to the maximum difference between the traffic flow. The width parameter and the
-insensitive are determined by particle swarm optimisation. The width parameter
for the RBF kernel is
and the
-insensitive loss for the SVR is 1 in this study. The settings of these six models are described below.
HA: This model predicts for a given the time of day the average of the same time on the same day in previous weeks.
RW: This model simply predicts the traffic flow next moment as equal to the current condition.
Kalman filtering: A wavelet denoising procedure proposed by Xie et al. [5] is employed to preprocess the traffic flow data. We use Daubechies 4 as the mother wavelet as suggesting in [5]. The variance of the measurement noise is 0, since we regard the measurement is correct. The initial state is set to , where n is set to 8, the same as Xie et al. [5]. The covariance matrix of initial state estimation error is
.
AR: The AR model is a representation of a random process and it has been widely used in traffic flow forecasting due to the randomness of the traffic flow. In the AR model with the order p, the current traffic flow is represented by a weighted combination going back p periods, following a random disturbance in the current period. In this regard, the order p is critical for the AR models. On the other hand, if the order is too high, more coefficients need to be estimated, and additional errors will consequently be introduced. The order in our experiment is set to 8 by cross-validate of our training data.
ANN: We employ the ANNs introduced in Zhu et al. [52]. The number of hidden layers is set to 1. The optimal goal is 0.001. The spread is 2000. The number of the nodes in the hidden layer is 40. Most of the network parameters are consistent with [52].
3.3 Evaluation criteria





3.4 Experiments setup
The collected data are divided into two parts, the first four weeks are used for training and the rest is used for evaluating. The process noise covariance of the low-level Kalman filter is evaluated by the EM algorithm, which is detailed in Section 2.1 using the first four weeks data. The measurement noise covariance
of the low-level Kalman filter is set as 0. The process matrix
of the first level Kalman filter is set as an identity matrix
. The initial covariance of the estimation error
is set as
. The initial system state is
.
The measurements are aggregated into 1, 5, and 10 min. The first-week measurements of A1 are displayed in Fig. 4. In Fig. 4, there are too many fluctuations in 1 and 5 min aggregations. The traffic flow data aggregated in high-resolution tend to be noisy, which will decrease the accuracy of the forecasting models. On the other hand, traffic flow data of too low-resolution provide little information for forecasting [1]. Another study reported that the resolution of the data should be equal to that of the data to be predicted [53]. Since we aim at the short-term traffic flow forecasting 10 min ahead and this work is not intended to forecast the fluctuations minute-wise, the 10-min average aggregations are used for evaluations.




ANOVA | Source | SS | df | MS | F | p |
---|---|---|---|---|---|---|
A1 | columns | 1,685,318.36 | 1 | 1,685,318.36 | 20.34 | 0.000007 |
error | 666,805,383.27 | 8046 | 82,874.15 | |||
total | 668,490,701.63 | 8047 | ||||
A2 | columns | 1,750,149.09 | 1 | 1,750,149.09 | 31.32 | 0.00000002 |
error | 449,558,440.01 | 8046 | 55,873.53 | |||
total | 451,308,589.10 | 8047 | ||||
A4 | columns | 1,470,660.99 | 1 | 1,470,660.99 | 20.24 | 0.000007 |
error | 584,733,671.62 | 8046 | 72,673.83 | |||
total | 586,204,332.62 | 8047 | ||||
A8 | columns | 375,921.70 | 1 | 375,921.70 | 11.99 | 0.0005 |
error | 252,293,036.27 | 8046 | 31,356.33 | |||
total | 252,668,957.97 | 8047 |

The covariance matrix of the observation noise is set as 0. The covariance of the estimated error of the high-level Kalman filter is set as 0, initially.
3.5 Scenarios of study
To demonstrate the performance of the proposed H-KF2, we introduce several typical and difficult scenarios that often happen in the road networks. Three scenarios are selected from the first day of last week on A1 motorway. It is more convincing to select the data from the same point in a single day to demonstrate that the forecasting results are not occasional.
3.5.1 Scenarios #1 – rush hour
The first scenario is the rush hours when the traffic flow increases rapidly, as shown in Fig. 6a. The traffic flow is growing from about 1300 vehicles per hour to over 2800 vehicles per hour. The low-level Kalman filter is prone to overshooting when the traffic flow increases quickly or drops suddenly. The RW model is hardly catching these rapid fluctuations, as well as most of the other forecasting models. In this scenario, our model is acted as a trade-off between both models. The H-KF2 prevents the overshooting of the Kalman filter, and prevents the falling-behind of the RW model. The corresponding APEs of the H-KF2, Kalman filter and RW model in this scenario are shown in Fig. 6b. As we see, the H-KF2 performs better than the Kalman filter, and better than the RW model alternately at times. The total performance evaluated by the RMSE and MAPE is listed in Table 2. The RMSE of H-KF2 is improved by 2.34% than the KF, and is improved by 4.03% than the RW. The total performance of the H-KF2 outperforms the rest two in such a difficult scenario. Such improvement is important in intelligent transportation systems since it alleviates the variance of the accuracy of the predictor, which is a critical issue in traffic management and traffic information publishing. Moreover, this model encumbers little time and space complexity to the existing systems.
H-KF2 | KF | RW | |
---|---|---|---|
RMSE | 198.78 | 203.44 | 206.80 |
MAPE | 7.74 | 8.01 | 8.00 |

3.5.2 Scenario #2 – late night
The second scenario is late at night when the traffic flow is low. The traffic flow ranges from <700 vehicles per hour to about 350 vehicles per hour in this scenario. In this scenario, a large error occurs at the fifth interval. At this interval, the low-level Kalman filter is overshooting, while the RW model cannot catch up the sudden change of the traffic flow. The high-level Kalman filter accurately forecasts the traffic flow at this interval. The high-level Kalman filter can catch the discrepancy of predictions by the low-level Kalman filter and RW model. The final prediction is compensated by a posteriori estimation of the discrepancy of the predictions by both models. In this scenario, the RMSE of H-KF2 is improved by 38.87% than the KF and is improved by 39.69% than the RW. The MAPE of H-KF2 is improved by 3.32% than the KF, and is improved by 3.97% than the RW. The H-KF2 helps to improve the forecasting accuracy effectively when the traffic flow is low, which is still a difficult issue in this area [3] (Fig. 7) (Table 3).
H-KF2 | KF | RW | |
---|---|---|---|
RMSE | 65.83 | 91.42 | 91.96 |
MAPE | 11.75 | 15.07 | 15.72 |

3.5.3 Scenario #3 – afternoon commuting hour
The third scenario is the shift of the commuting time in the afternoon. The traffic flow is growing from under 1500 vehicles per hour to more than 2800 vehicles per hour in one hour. Although the increasing rate of the traffic flow keeps changing every moment, which makes it hard to predict, the H-KF2 is successful to manipulate these changes and provide more accurate forecasting results. This is because the H-KF2 makes the final prediction based on the prior predictions of both models, and estimates a more reasonable prediction according to the discrepancy of both primary models. The RMSE of H-KF2 is improved by 7.27% than the KF, and the MAPE of H-KF2 is improved by 2.02% than the KF (Fig. 8) (Table 4).
H-KF2 | KF | RW | |
---|---|---|---|
RMSE | 221.53 | 237.63 | 226.22 |
MAPE | 6.67 | 8.69 | 7.46 |

3.6 Performance evaluation
In order to demonstrate the superior performance of the H-KF2, we compare with six models including the state-of-the-art ones in Table 5. The six comparison models include parametric and non-parametric ones that are often integrated into intelligent transportation systems. The fifth-week data from A1, A2, A4, and A8 are used for evaluation. Although the RMSE is better for showing the bigger deviations, the MAPE has the easiest interpretation and is useful to compare the precision between different volumes under study [54]. Furthermore, the MAPE is more suitable for the comparison of long time series. Since we would like to compare the performance of the models for a whole week, we use the MAPE as the evaluation criterion.
A1, % | A2, % | A4, % | A8, % | |
---|---|---|---|---|
SVR | 14.3 | 12.2 | 12.2 | 12.5 |
HA | 16.9 | 15.5 | 16.7 | 16.2 |
RW | 12.6 | 11.4 | 12.1 | 12.4 |
AR | 13.6 | 11.6 | 12.7 | 12.7 |
ANN | 12.6 | 10.9 | 12.5 | 12.5 |
KF | 12.5 | 10.7 | 12.6 | 12.6 |
H-KF2 | 11.3 | 10.0 | 11.3 | 11.6 |
In Table 5, the H-KF2 outperforms the typical parametric and non-parametric methods. The H-KF2 is easy-to-implement and fast computation to achieve competitive accurate and robust traffic flow forecasting in practice for the large-scale real-time environment. The H-KF2 outperforms the RW model and the Kalman filter in these four datasets, since the predictions of the H-KF2 are rectified in the high-level Kalman filter according to the discrepancy of these two models. The HA is often used as a baseline for a new model. The H-KF2 is much better above the baseline. Meanwhile, compared with the non-parametric models, the H-KF2 has better performance than the non-parametric models with much less time and space. Moreover, the H-KF2 can work well with very small training data set. On the other hand, the performances of the support vector machine regression, the ANN, and the AR model are highly depended on the quantity and quality of the dataset [7]. However, the H-KF2 can work with limited data, and dynamically updated with new coming data. It is worth pointing out that the H-KF2 can be integrated with other predictors, and can be expanded to more layers.
In Table 6, all the models are implemented using Matlab™ 2017b on a workstation equipped with Intel™ Core i7 3.6 GHz, 16 GB RAM. The first four weeks’ data is used for training and the fifth week’ for is for testing. The HA and the RW models are fastest without high accuracy. The training time of the support vector machine regression and the ANN are increasing with the quantitative of the training data, and the prediction accuracy is highly depended on the quality of the training data. The AR model is fast, but the need manual decision of the order of the model, which requires expert domain knowledge and time to adjusting the model. The computation time of the H-KF2 and direct Kalman filter is similar, but the H-KF2 outperforms accurately. Furthermore, the computation time of the H-KF2 and the Kalman filter will approximate the same regardless of the increasing of training data.
A1, s | A2, s | A4, s | A8, s | |
---|---|---|---|---|
SVR | 2.187 | 2.127 | 2.137 | 2.130 |
HA | 0.000 | 0.000 | 0.000 | 0.000 |
RW | 0.000 | 0.000 | 0.000 | 0.000 |
AR | 0.001 | 0.001 | 0.001 | 0.001 |
ANN | 3.003 | 3.053 | 3.027 | 3.134 |
KF | 0.838 | 0.833 | 0.828 | 0.833 |
H-KF2 | 0.868 | 0.845 | 0.840 | 0.848 |
In Table 7, the EKF is involved in a comparison. The EKF performs worth than the direct Kalman filter on the A1 and A8, while performs slightly better than the direct one on the A2. On the A4, the EKF outperforms the direct Kalman filter. It is because the Kalman gain is optimal only and if only transition and observation models are linear in their arguments, which does not meet in the extended versions. The H-KF2 outperforms the direct Kalman filter and the EKF on the A1, A2, A4, and A8.
A1, % | A2, % | A4, % | A8, % | |
---|---|---|---|---|
EKF | 13.1 | 10.6 | 11.9 | 12.8 |
KF | 12.5 | 10.7 | 12.6 | 12.6 |
H-KF2 | 11.3 | 10.0 | 11.3 | 11.6 |
4 Conclusion
This paper presents a novel dual Kalman filter for short-term traffic flow forecasting. Our key idea is to leverage the discrepancy between the predictions of the traditional Kalman filter and the RW model. In our mechanism, the Kalman filter can dynamically estimate the a posteriori state of the dynamic system modelled by the prediction errors. In this way, we can produce calibrated compensation to the preliminary predictions. In the end, we test our model on four real-world datasets, compare it with various state-of-the-art models, and show the superiority of our model over the direct Kalman filter, the EKF, and other models.
In the future, we plan to explore the potential of our model for other applications, extend this framework to more layers, and integrate other models.
5 Acknowledgment
This work is supported by Natural Science Foundation of Guangdong Province (Grant No. 2018A030313291), STU Scientific Research Foundation for Talents (NTF18006), Science and Technology Planning Project of Guangdong Province (Grant No. 2016B010124012), and partly supported by Natural Science Foundation of China (Grant No. 61772206, U1611461, 61472145), Special Fund of Science and Technology Research and Development on Application From Guangdong Province (Grant No. 2016B010124011), and Guangdong High-level personnel of special support program (Grant No. 2016TQ03X319).
7 Appendix
7. 1 Appendix 1


Proof.










7.2 Appendix 2





























