EP-LSTM: Novel prediction algorithm for moving object destination

Predicting the destination of a moving object is a popular research subject in location-based services. By predicting destinations, suggestions can be offered to people regarding their trips. At present, there are problems such as data sparsity and long-term dependence based on historical trajectory prediction methods, which affect the accuracy of prediction. To solve data sparsity problem, this paper has devised an improved minimum description length method, which incorporates weighting parameters and optimizes the partitioning of trajectories with undirected complete graph. Long–short-term memory is a trajectory-prediction model that solves the problem of long-term dependence, but the model tends to have vanishing gradient issues when used to process longer sequences. This is because the hidden layers of long–short-term memory is largely affected by the lengths of sequences. Using embedded technology, the authors convert trajectory sequences into embedded vector sequences, and thus propose a deep-learning prediction model, EP-LSTM (Embedded Processing - Long Short Term Memory), which integrates embedded technology and long–short-term memory. The authors have conducted a great amount of testing with real data sets, comparing EP-LSTM with currently available predicting methods. The results have shown that EP-LSTM not only effectively solves data sparsity and long-term dependence but also achieves a high degree of prediction accuracy.


INTRODUCTION
With the development of location-based services (LBS), many moving positioning devices have emerged, which can provide users with precise moving trajectories using global positioning systems (GPS, Beidou etc.). More importantly, by studying the trajectory data, we find that human social activities have certain regularity and that people always arrive at certain places repeatedly [1]. For instance, everyone has their own behavior habits. They are used to going to work in a certain company, shopping in a certain mall, eating in a certain restaurant, and even choosing the transportation way when they travel to and from here is nearly the same. Everyone has their own 'circle' and they visit friends and relatives from time to time. The purpose of destination prediction of the trajectory is to predict the destination before the moving object sets off, namely, the location information of the moving object at a certain time in the future. Now the coordinate information is no longer a simple longitude, latitude This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2020 The Authors. IET Intelligent Transport Systems published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology or plane horizontal and vertical coordinate values. Among them, the trajectory implies the action rules and hobbies of massive moving objects. If we get the location information of the moving objects, we can push relevant services to them timely and accurately, such as pushing the information of nearby delicious food and discount in the shopping mall to the user. Destination prediction is an essential task for many emerging location-based applications, including intelligent navigation [2] and detecting road congestion to conduct traffic planning. In terms of people's lives [3], the urban planning manager can understand the pattern of residents' activities for urban planning. In terms of disease transmission, the destination prediction based on moving objects can control the movement of residents. In terms of mining hot spots for business analysis [4], we can make advertising more accurate and efficient. The destination prediction of moving objects is an important part in the field of trajectory prediction. In the recent years, many scientists have studied the trajectory prediction of moving objects, and made some achievements in this field. Trajectory prediction models can be mainly divided into the following three types.
The first type is prediction methods based on sequence analysis. In [5], Morzy adopted the improved algorithm to generate association rules. In [6], they constructed a T-pattern decision tree through the movement pattern in trajectories, and found the best matching path from the tree to predict the next position of the trajectory. However, the method has a significant drawback, as it tends to ignore the data sparsity. The obtained trajectories of the moving object are far from covering all the queries of trajectories in the data set, which is called the data sparsity problem. Hence, the model will lose the predictive ability. At present, the methods to solve the data sparsity problem are gridding [7], clustering [8], and similarity matching [9,10]. But they all have some shortcomings. For example, the gridding method will lose some important trajectory information, while the similarity matching method will reduce the matching rate in the case of less historical trajectory information, resulting in inaccurate prediction results.
The second type is prediction based on Markov statistical learning. In [11], a simple Markov model is used for position prediction. It only considers the influence of current position on the future position, resulting in low prediction accuracy. In [12], the hybrid Markov model is used to predict the moving path of pedestrians, which reduces the space cost of the secondorder Markov model and effectively predicts the future position of pedestrians. In [13], Yang et al. proposed an algorithm based on DBSCAN to obtain the residence point, and reached the prediction of the next position of the moving object by using the variable order Markov model. However, the trajectory destination prediction algorithm based on Markov only considers two to three GPS points in front of the sequence, which is not suitable for the trajectory with long dependence.
The third type is prediction based on deep neural network (DNN). In [14], Lv et al. designed T-CONV model to predict the destination of moving objects by using multi-level convolution neural network to extract multi-scale two-dimensional trajectory features. In [15], Rossi et al. used geographic information in a location-based social network and encoded the location semantics of the visit, and then established a recursive neural network model to predict the next destination. However, these neural network models are not good at memorizing long trajectory sequences, resulting in unsatisfactory prediction. In particular, the recurrent neural network (RNN) [16], one of the deep learning networks, which has the problems of gradient disappearance and gradient explosion, is always facing the challenge of long-term dependence problem. When the number of trajectory points that the trajectory prediction destination depends on increases, and the relevant point of trajectory prediction is too far from the output time, the occasion is called the longterm dependence problem. Long-short-term memory (LSTM) [17] is very popular for trajectory prediction to solve the longterm dependence. However, the LSTM model is not effective in dealing with long sequences. Its hidden layer is greatly affected by the length of trajectory data sequence, which requires the input of low-dimensional vector, otherwise it will lead to serious dimension disaster and increase time overhead, so as to reduce the learning efficiency of the model. Therefore, it is still facing great challenge to effectively predict the destination just based on historical trajectories.
To fully address the abovementioned issues, this paper proposes a deep learning prediction model named EP-LSTM (Embedded Processing -Long Short Term Memory), which is validated on real data sets. It can not only effectively solve the data sparsity and long-term dependence, but also has better prediction result. Note that, this paper focuses on the destination prediction of moving object, but the present work can also be applied to location prediction in other fields, such as prediction of typhoon landing point and prediction of epidemic virus transmission.
In summary, we make the following contributions in this paper.
1. We put forward an improved minimum description length (IMDL), which is used to transform the trajectory segmentation into the shortest path for solving the undirected complete graph, so as to obtain the best segmentation of the trajectory. As a result, this method effectively overcomes the data sparsity problem in destination prediction. 2. We propose a deep learning prediction model named EP-LSTM. First, in order to reduce the dimension of the input sequence, the input sequence is transformed into embedded vectors using embedded technology. Furthermore, the feature vectors are extracted and used as the input for LSTM, which can effectively alleviate the long-term dependence problem. Consequently, the EP-LSTM algorithm can effectively solve the long-term dependence without neglecting the movement law of moving objects. 3. We conduct extensive experiments with the real trajectory data set of Kaggle-ECML/PKDD competition [18]. The prediction accuracy and distance error are used to evaluate the effectiveness and effciency of the algorithm. A large number of experimental results with the real trajectory data show that the accuracy of our algorithm is 82.4%, far beyond the current existing algorithms.
The remainder of this paper is organized as follows: Section 2 offers the problem statement and the methods analysis. Section 3 presents the EP-LSTM algorithm in detail. Section 4 reports experimental results. Finally, Section 5 concludes the paper.

DESTINATION PREDICTION ALGORITHM EP-LSTM
For destination prediction of moving objects, the common method is to build a model based on historical trajectories. If there exists a history trajectory that includes the part of current trajectory, the terminal point of the history trajectory is regarded as the destination of current trajectory. However, this method does not take into account the limited number of trajectories in the data set and the limited amount of information, and  destination prediction is also affected by other objective factors, such as data sparsity problem. Moreover, the existing prediction methods based on Markov chain and RNN's DNN cannot solve the long-term dependence problem. Markov chain only depends on the few GPS points, and historical data has a certain timeliness. RNN has the problem of gradient disappearance, which does not solve the long-term dependence.
In order to address the problems of data sparsity and longterm dependence, the trajectory destination is predicted accurately while maintaining trajectory information. A novel method named EP-LSTM algorithm is proposed to predict the destination. Here, Figure 1 shows the flow chart of the algorithm training, and Figure 2 shows the flow chart of applying the algorithm to prediction. First, the historical trajectory data are segmented, and the IMDL method is used to process the original trajectory data. While retaining the original trajectory characteristics, each trajectory data is segmented to solve the data sparsity, as detailed in Section 2.2. Then, an embedded technology is presented, which extracts the feature vector and converts the trajectory sequence into the embedded vector sequence as the input of the model. LSTM model was trained on data set. The problem of long-term data dependence has been effectively solved, which largely makes up for the deficiency of the RNN model. Finally, we put the processed training data into the LSTM model for training and select the best training output prediction model.
In Figure 2, the test trajectory data is preprocessed first, and the trajectory sequence is converted into an embedded vector by embedded technology. Finally, the trajectory destination is predicted by inputting the sequence vectors into EP-LSTM algorithms as detailed in Section 2.3.

Basic definitions and problem statements Definition 1. (Trajectory sequence):
A trajectory is a collection of GPS trajectory points generated by a moving object within a certain period of time. O is a collection of moving objects, represented by Three components of the distance between trajectory segments trajectory of moving object is represented by T i = (p 1 , p 2 , … p i ), where λ i represents that the location of moving object O i at time t i , i , and i correspond to the latitude and longitude of p i , respectively. T 1:i = (p 1 , p 2 … p i )is defined as a part of the moving object trajectory from the beginning to the current point, i ≤ n. V s = {V 1 , V 2 , … V s−1 } represents the vector mapped by the trajectory.

Definition 2. (Destination prediction of moving objects):
Assume that T is a historical trajectory composed of K historical position sequences, where T can be shown as (p 1 , p 2 , … , p k ). The current trajectory is When i is equal to s, the position of the moving object at time i is expressed as T s .

Trajectory data pre-processing
Before inferring the trajectories of moving objects, we have to preprocess trajectory data. In this section, we introduce a welldesigned pretreatment method to achieve accurate prediction of destination. In order to make the trajectory more concise and convenient, it is necessary to segment the trajectory. Minimum description length (MDL) has been proved to be one of the effective trajectory segmentation methods. For trajectory segmentation, first, the feature points are found in a trajectory, and the adjacent feature points are connected to form a new trajectory. This paper presents an IMDL approach to simplify the trajectory.
According to the definition of reference [19], first, the distance between different trajectory segments is defined, and then the trajectory segmentation problem is formally described. The distance between lines consists of three parts: vertical distance, parallel distance, and angular distance. The trajectory can be represented as a line segment in three-dimensional space, T is a trajectory sequence, and T = (p 1 , p 2 , … , p n ), where p i representing the position of the trajectory T at time Figure 3 shows the definitions of three line segment distances.
In Figure 3, any two trajectory segments are represented by line segments L i and L j . The vertical distance, parallel distance, and angular distance between L i and L j are shown in Equation (1). L ⊥1 denotes the Euclidean distance between point S j and its vertical mapping point p s on line segment L i , and L ⊥2 denotes the Euclidean distance between e j and p e . ∥ L j ∥ denotes the length of line segment. (0 o ≤ ≤ 180 o ) is the smaller angle between the line segments L i and L j .
According to the above definition, the distance between the lines is expressed in Equation (2).
(2) where ⊥ , ∥ , and are the weights corresponding to the vertical distance, the parallel distance, and the angular distance, and they are usually set as 1.
The MDL consists of two parts: L(H ) and L(D|H ). Here, H means hypothesis, and D means data. L(H ) is used to describe the hypothetical length, and L(D|H ) is the code length obtained by encoding data D with the help of the hypothesis of H. When the sum of the two is minimal, H is the best hypothesis to explain data D.
There is a disadvantage in this method. It can only find the approximate trajectory segmentation that makes A smaller. However, in actual applications, there are certain requirements for the accuracy and simplicity of the trajectory. In this paper, first, an IMDL method is presented, which makes the accuracy and simplicity of the trajectory meet the requirements [20]. We find a segment to minimize mL(H ) + nL(D|H ) with mL(H ) + nL(D|H ) instead of L(H ) + L(D|H ). Here, m, n ≥ 0. At the same time, the size of m and n is adjusted according to the specific requirement to get the best segment. L(H ) and L(D|H ) are shown in Equation (3): Hence, L(D|H ) denotes the sum of the length of all trajectory partitions, and L(D|H ) denotes the sum of the difference between a trajectory and a set of its trajectory partitions. Then, all the points in a trajectory are converted into an undirected complete graph, namely, the trajectory segmentation problem is transformed into the shortest path problem of the undirected complete graph, where the weight (p i p j ) of edge p i p j is calculated by mL(H ) + nL(D|H ). Finally, the best segmentation can be obtained using the calculation weight method between any two nodes in the undirected complete graph. ) //Equation (3) 07. IFcost par > cost nopar THEN

15.
Add p n into the set C; 16. RETURN C; Figure 4 shows a simple example of converting the best segmentation problem into the shortest path. This is a trajectory that contains four points, and any two points of which can calculate L(H ) and L(D|H ). Furthermore, the weight on each edge can also be calculated. Hence, we can calculate the best segmentation, which is the shortest path to the undirected complete graph. Namely, the shortest path is an undirected complete graph from p i to p 4 .
Next, trajectory segmentation algorithm named IMDL is depicted in Algorithm 1.
The basic idea of the above IMDL algorithm is as follows. First, a segmented trajectory set C is defined, the first trajectory point in each trajectory is added to the set C, and the variables are initialized (Lines 01-03). Then, traverse all the trajectory points (Line 04). According to the actual need for the accuracy and simplicity of trajectories, the size of the parameters m and n is determined (Line 05). In addition, the characteristic points are determined by calculating the cost function, and the L(H ) and L(D|H) of two position points are calculated by using Equation (3). We replace L(H ) + L(D|H ) with mL(H ) + nL(D|H ) in order to find a segment to minimize mL(H ) + nL(D|H ) (Line 06). Then, the cost function after trajectory sequence segmentation is compared with the cost function generated by nonsegmentation. If the former is greater than the latter, put the previous position point in the trajectory sequence as the feature point into set C. Otherwise, it is executed down (Lines 07-14). Finally, the segmented trajectory data set C is output (Lines 15-16).
By applying trajectory segmentation algorithm, on the one hand, it can improve the efficiency of prediction algorithm. On the other hand, it can promote the accuracy of prediction by extracting the features of long trajectory sequence while retaining the original effective information.
In view of the trajectory segmentation algorithm proposed above, the algorithm flow in the trajectory pre-processing phase is as depicted in Algorithm 2.
For Algorithm 2, first, the trajectory data is cleaned and denoised, where the trajectory sequence is empty or the trajectory sequence length is greater than a specified threshold (according to experimental settings, such as 1000) to find and delete (Line 01). Then, the trajectory sequence is simplified by using IMDL trajectory segmentation method (Line 02),which is shown in Algorithm 1.
In addition, divide the trajectory sequence data set by sliding window (Lines 03-04). Finally, the training data set X and label y are output (Line 05).

Feature vector extraction and embedded processing
In Section 2.1, through the trajectory segmentation algorithm of IMDL, a collection of segmented trajectory data can be obtained. At the same time, based on this trajectory segmentation method, the data sparsity problem can be effectively overcome. In this section, we will process the sequence after the trajectory segmentation. In order to extract the feature vector, we need to convert the position coordinate points in the segmented trajectory sequence into a grid chain, modify the time-stamp data format, and, finally, divide the data set. One-hot encoding is an approach to extract text features in natural language processing. But the one-hot encoding algorithm has the following two defects.
1. When the amount of data is large, it will face the problem of too long vector length and too little effective information. 2. It cannot show the movement pattern of moving objects, and the relationship between geographical location cannot be reflected.
Although we have simplified the trajectory, the dimension of the encoded trajectory sequence vector is still relatively high for the neural network model. In view of the defects existing in one-hot encoding, we use embedded technology to solve the problem of discrete input values. Namely, the input trajectory sequence is mapped to an embedded vector. Based on this idea, we propose a novel prediction algorithm named EP-LSTM, which is a deep learning prediction model based on embedded technology combined with LSTM.
Before introducing the EP-LSTM algorithm, first, we introduce the embedded technology. It is called word2vec technology which is often used in natural language processing field. Word2vec can better express the similarity and analogy between different words. Essentially, it is a kind of shallow neural network. Word2vec is divided into two large frameworks in terms of realization: negative sampling and hierarchical software max. In the present study, selected hierarchical software max is selected to realize word2vec,which consists mainly of two models: continuous bag-of-words model (skip-gram) and continuous skip-gram model (CBOW). CBOW model predicts context based on target words, Skip-gram model predicts target words based on context. Because the destination prediction of the trajectory is based on the historical trajectory data, CBOW is a suitable model to implement hierarchical software max.
The CBOW model consists of three layers: input layer, projection layer, and output layer. It is a three-layer neural network. Compared with DNNs, it is not necessary to calculate all the non-leaf nodes, and the node parameters on the path are updated by the partial derivative each time. The efficiency of word vector training is improved. The model is equivalent to the vector of a word bag model multiplied by an embedding matrix, resulting in a continuous embedding vector. Its framework is shown in Figure 5.
As shown in Figure 5, the mapping idea between layers in the CBOW model is as follows.
1. Input layer: the one-hot encoding method is used to form a low-dimensional feature vector X, and then the feature vector is updated by linear transformation. Here, W is a dictionary composed of all words. The linear transformation is as shown in Equation (4).
2. Projection layer: the input matrix is mapped to a vector by summing and averaging the corresponding dimensions, as shown in Equation (5).
3. Output layer: the occurrence frequency of each word is taken as the weight, and then the Huffman tree is constructed and output according to the weight and model parameters of each category, as shown in Equation (6).
With regard to the calculation of gradient, the CBOW model uses the random gradient escalation method to update each parameter. Therefore, the maximum likelihood function is used to solve the parameter on the binary tree, namely, the vector on the non-leaf node. Hierarchical softmax constructs a classified binary tree using Huffman encoding. It reduces the computational complexity. When N training samples are trained, the number of occurrences of each category corresponds to the weight of the Huffman tree. For samples with more occurrences, the path is shorter, and for samples with fewer occurrences, the path is longer.
The main idea of CBOW is to maximize the probability of p(w|Context (w)). With the training of the neural network, the parameters are constantly updated until a suitable weight vector is found, so that the conditional probability p(w|Context (w)) is the largest. The result of multiplying the weight vector by the input matrix is the word vector corresponding to the input value, which prepares for building the LSTM model.
The objective function based on the neural network model usually uses a logarithmic likelihood method, as shown in Equation (7).
The key is to construct the conditional probability p(w|Context (w)). The essence of hierarchical softmax is to change the N classification problem into a binary classification problem. The idea of training for CBOW is to decompose the complex normalized probability into a series of conditional  (8).
The conditional probability of each layer corresponds to a binary classification problem. This can be fitted by a simple logistic regression function, which transforms the probability normalization problem into a probabilistic fit problem.
As aforementioned, the trajectory sequence is transformed into an embedded vector sequence through embedded technology and then the feature vector is extracted. This method effectively solves the problem of sparse data and information loss caused by one-hot vectors. Because the embedded vector sequence is low-dimensional and contains hidden vectors between locations, it can provide a better training set for the destination prediction model.

EP-LSTM PREDICTION MODEL
More recently, destination prediction methods based neural network model have attracted more and more attention. LSTM is a neural network based on RNN. The gradient of RNN after many stages of transmission tends to disappear or explode, always facing the challenge of long-term dependence. LSTM is one of the models to address the long-term dependence [17], and its core is to add a special unit -memory module. The unit structure of LSTM is shown in Figure 6. Xt denotes the input value at time t; h t denotes the output data at time t; h t−1 denotes the output value of the previous cell state; i,f and O are the input layer, the forgetting layer, and the output layer; and C t−1 is the state value of LSTM unit at time t − 1; C t is the state value of LSTM unit at time t; and σ represents the sigmoid activation function, which converts digital signals to a value between 0 and 1. Each tanh layer generates a vector for updating the input content. ⊕ denotes the addition of the matrix. ⊗ means to multiply the corresponding elements in the operational matrix. The unit structure of LSTM is shown in Figure 6.
There are three main stages within LSTM: 1. Forget the stage: forget gate ( f t ). The function of the forget gate is to forget information. In the use of LSTM, this stage is made up of sigmoid neural network layer and multiplication by bit. C t −1 point multiplication with f t means discarding the information to be forgotten, and i t point multiplication with C ∼ t means picking out the information to be updated from the candidate value vector (as shown in Equation (9)). 2. Selective memory stage: input gate (i t ). The function of the input gate is to selectively memorize. The stage consists of input gate with neural network layer and multiplication by bit. First, the input gate is used to calculate the state to be input, represented by i t , where the sigmoid function determines the part that needs to be updated. And a new candidate variable C ∼ t is created by tanh layer, and the value range of the function is between (−1, 1). As shown in Equation (10), it can select the information to update by the input gate point multiplication with the candidate value variable C ∼ t . 3. Output stage: output gate (O t ). The function of the output gate is to decide which will be used as the output of the current state. At this stage, the output gate, tanh, neural network layer, and multiplication by bit operate together to transfer the cell state and input signal to the output. Namely, input C t into the tanh function and compress all parameter values between −1 and 1, as shown in Equation (11).
Here, let X t be the input time sequence at the current time t, through the above gating structure, the cell state c is given by the following equations: LSTM is a special RNN whose core idea is to control the transmission state through gated state, remember the information that needs to be remembered for a long time, and forget the unimportant information, mainly to solve the problem of gradient disappearance and gradient explosion during long sequence training. Next, starting from the core part of LSTM, the training process of destination prediction algorithm is described in detail, as shown in Figure 7.
In order to simplify the model, only one hidden layer was drawn, and the original trajectory data was preprocessed in IMDL method to obtain the collection of historical trajectory sequences. A trajectory sequence composed of multiple positions can be obtained by adopting embedded technology and represented by X = {e 1 , e 2 , e 3 ,…,e n }. By randomly selecting a certain number of sample data from X and setting p positions after the number of samples as label y, the X and y are used as the input of the model. It is transmitted from the input layer to the LSTM neural network linearly connected by N hidden If the number of samples is set as k, the trained vector sequence y = {y k+1 , y k+2 , …,y k+p } can be obtained through the calculation of the activation function and the backpropagation mechanism. Among them, the activation function of the hidden layer from time t = 1 to time t = T is continuously iterated by Equation (12): where C t denotes the state value of the memory unit at the current moment, W denotes the weight between the input layer and the hidden layer, b denotes the bias vector, and σ denotes the activation function of the hidden layer. When the vector sequence passes to the output layer, the output result is the position corresponding to each position of embedded vector sequence. Therefore, the embedded position vector of the output can be obtained by putting the trajectory sequence to be predicted into the trained model. Finally, the Euclidean distance is calculated between the output sequence and all the vectors in the input sequence, and the embedded position vector with the smallest distance is the prediction result.
In the field of trajectory prediction of moving objects, LSTM is one of the important models for solving long-term dependence problems. However, the hidden layer of LSTM is greatly affected by the length of data sequence. We use the embedded technology introduced in Section 2.3 to transform highdimensional input vectors into word vectors, which also transforms trajectory sequences into distributed vector sequences to reconstruct LSTM model. In this paper, the prediction algorithm EP-LSTM is proposed, which combines the embedded technology with LSTM to overcome the gradient disappearance of the basic LSTM model to deal with the long trajectory sequences and improve the accuracy of destination prediction, as depicted in Algorithm 3.
The basic idea of the above algorithm EP-LSTM is as follows. First, the training parameters are initialized (Line 01), and then myModel. fit(embed, y);

10.
Adjust parameters by optimization; the graph structure of the neural network is constructed (Lines 02-11). Here, a multi-layer LSTM structure is constructed to process the embedded position vector (Lines 02-06). In addition, we use embedded technology based on word2vec principle to convert trajectory sequence to word vectors, and the corresponding elements of the index in the trajectory sequence are selected as input of the model. Then, the preprocessed data are trained in the model and the loss function is calculated. Through a certain number of iterations, the errors corresponding to different iterations can be calculated (Lines 07-11). Finally, the best training model is selected and output (Lines 12-13).

END
The destination prediction algorithm based on EP-LSTM is given below, as depicted in Algorithm 4.
The basic idea of the above destination prediction algorithm is as follows. First, the test trajectory sequence is preprocessed based on Algorithm 2 (Line 01). Then, the test trajectory sequence is transformed into an embedded position vector to be predicted (Lines 02-03). In addition, we use the training model of Algorithm 3 to input the embedding position vector to be predicted into the model for prediction (Lines 04-05). Next, by traversing each trajectory sequence, calculating the Euclidean distance, the embedded vector closest to the real destination is found within the limited threshold range. The position of the vector after mapping is the most likely location in the destination prediction (Lines 06-09). Finally, the best training model is selected and output (Line 10).

Data set and test environment
In our study, we focus on the Porto data set; it was released in the context of the ECML/PKDD 2015 challenge and hosted as a Kaggle competition [18], which allows us to easily compare our model with the winner's approach. It provides data of 442 taxis in Porto for one year. Sample data from the experimental data set is shown in Table 1. Table A.1 gives a detailed description of each feature name, and Table A.1 is given in Appendx 1. All algorithms were implemented on Anaconda3 IDE, using the Python programming language. The hardware environment included an Intel(R) Xeon(R) E5-2620 v3 2.40 GHz CPU, with 64GB RAM.

4.2
Performance evaluation Definition 1. (Distance error calculation): This paper used Haversine distance to calculate the distance deviation between the real destination and the predicted destination, and the deviation is used as the standard to measure the prediction model. It measures the distance between two points on the sphere according to longitude and latitude, as shown in Equation (13).
) dis haver sin e = 2R arctan where λ 1 and λ 2 are the latitudes of the two coordinate points, and φ 1 and φ 2 are the longitudes of the two coordinate points.
dis haver sin e denotes the Haversine distance, and R is the radius of the earth, set as 6371 km.

Definition 2. (Prediction accuracy):
Trajectory sequence is T = (p 1 ,p 2 ,…,p n ), and the predicted trajectory sequence can be expressed as T p = {T p 1 , T p 2 , … , T p n }. The definition of prediction accuracy is shown in Equation (14).
where T p i represents the predicted result of the i sample, T i is the real result, d(p,q) represents the Euclidean distance between p and q in space, and σ represents the distance threshold, such as setting it to 1 km. When d (T i , T p i ) < , it means that the prediction is relatively accurate within the threshold range, and the predicted destination is similar to the real destination.

Model training and test results analysis
In order to evaluate the effectiveness of the proposed algorithm, this section uses the distance error criterion in Section 4.2 to quantify the performance of the algorithm. Based on the observation that the local portions near the start and the end of a trajectory play more significant roles to the destination prediction, we does not take the complete trajectory length, but intercept the trajectory information near the starting point and the end point according to the general partition ratio (such as 30%) as input. In the training phase, with the increase of the number of iterations, the error decreases gradually, and after a certain number of iterations, the training error tends to be stable, and the model training is over. The results is given in Table 2.
As shown in Table 2, we mainly compare the EP-LSTM with the neural networks-based models [21], which win the champion of the competition. This table shows the sum of the errors of all training set trajectory data. It is calculated based on the distance measurement Equation (13). Model 1 is the champion model of the ECML-PKDD competition, which predicts the travel destination of moving objects through multilayer perceptron (MLP) and meanshift clustering. Models 2, 3, and 4 are improved models based on MLP, which predict destinations by embedding other attributes or performing internal conversion processing. Model 5 reads the GPS points in the trajectory by using RNN, and predicts the trajectory destination of the moving object directly through the last internal state. Models 6 and 7 are predicted using a bidirectional RNN model, which is a variant of RNN. It is composed of two RNN superimposed together. The output is determined by the state of the two RNN. From the test error results, we can see that the EP-LSTM algorithm proposed in the present study has a great improvement compared with the above seven models.
In the task of training model, we used the deep learning library tensor-flow to build the LSTM model, and choose the optional parameters for the EP-LSTM algorithm after many experiments. To train the network, we used the adagrad optimize to obtain the final prediction model. The adagrad optimize is an adaptive gradient algorithm, which can automatically adjust the learning rate during training. The network is trained using the mean squared error (MSE) as the loss function and implementing the early stopping. During the test phase, we use the parameters of the LSTM that produced the best MSE score on the validation set. The parameters of the EP-LSTM algorithm are given in Table 3.
The main parameters are reported in Table 3. They are used to train the EP-LSTM model. During the experiments, in order to maintain the relative stability of the training process, we choose fixed learning rate for training. Moreover, the determined initialization parameters and GPS trajectory data are used as the input to the algorithm. The embedded vector data corresponding to the trajectory data are the input value of LSTM model.
As shown in Table 3, learning_rate_base denotes the basic learning rate, earning_rate_decay denotes the attenuation rate, vocabulary_size represents the dimension of input data converted into word vector, embedding_size denotes the dimension of embedding vectors, and num_hidden represents the number of neurons in the hidden layer. In addition, we choose the CBOW model with a window size of 6 and without fifiltering words on frequency context_window, input_size denotes the length of the input data, and num_sampled denotes the size of the data input into the loss function.
Next, we conduct experiments to train EP-LSTM models. In order to test the prediction accuracy of the EP-LSTM algorithm, for this set of experiments, we use the same division way as in reference [21], by randomly selecting 19,427 trajectories from the original set as the verification set, 19,770 trajectories as the test set, and the rest as the training set. After the training of the model is completed, the validated data set is used to verify the model. The result is shown in Figure 8.
As shown in Figure 8, the vertical axis represents the error value, and the horizontal axis represents the number of iterations. The solid and dashed lines, respectively, represent the training error and verification error. The smaller the error value, the better the training model. As can be seen from Figure 8, as the number of iterations increases, the error value in the validation data set is always smaller than that in the training data set, namely, the prediction effect is better, which avoids the overfitting in the training process. We can also find that when the number of iterations reaches 2000, the difference in the error value between the validation data set and the training data set is the largest. Moreover, when the number of iterations is 12,000, the error value tends to be stable, reaching about 0.31. As a consequence, the trajectory segmentation algorithm of IMDL effectively solves the data sparsity problem.

Comparison and analysis
In this section, we conduct experiments to evaluate the effectiveness of the EP-LSTM algorithm, and compare it with the existing algorithm. At present, there are some related algorithms for trajectory destination prediction. The most common trajectory matching method is called Baseline algorithm [22], which matches the query trajectory with the historical trajectory. However, the algorithm has a great drawback. When there is no 'matching' trajectory in the historical trajectory, it is difficult to meet the need of trajectory prediction in real life. There is also Markov prediction algorithm [23], which established the transition matrix by calculating the conversion probability between user locations, and then infers the most likely location of the destination. In addition, Xue et al. proposed a SubSyn algorithm [7], which decomposes the historical trajectory into subtrajectories consisting of two adjacent positions, and the subtrajectories are spliced into a complete trajectory for trajectory prediction.
In order to prove that the EP-LSTM algorithm effectively solves the problem of low hit ratio, we use the prediction accuracy to quantify the performance of the model. Therefore, we report the prediction accuracy and prediction time between the EP-LSTM algorithm and the existing algorithms, including SubSyn, Baseline, and 2-Markov (second-order Markov chain). Comprehensive experiments based on real trajectory data show that EP-LSTM can achieve higher accuracy than the existing algorithms. The results are shown in Figures 9 and 10.
As shown in Figure 9, the vertical axis represents the prediction accuracy of four algorithms EP-LSTM, SubSyn, Baseline, and 2-Markov, and the horizontal axis represents the number of trajectories of moving objects. As can be seen from Figure 9, with the increase of the number of trajectories in the training set, the accuracy of all the algorithm has an apparent improvement. However, compared with the other three algorithms, the prediction accuracy of EP-LSTM algorithm has been significantly improved. Based on the experimental results,we find that the number of trajectories of moving objects reaches 12,000, and the prediction accuracy reaches 82.4%, which shows the effec- tiveness of the destination prediction algorithm proposed in the present study. At the same time, it also reflects that the methodology of introducing trajectory segmentation and embedded processing is desirable.
As we know, the time complexity of the algorithm can reflect the efficiency of the algorithm to a great extent. The lower the time complexity of the algorithm, the higher the efficiency of the algorithm. According to the definition of time complexity, it refers to the time spent in running the algorithm, excluding training time and verification time. Therefore, in order to evaluate the operation efficiency of EP-LSTM, we adopt prediction time as evaluation metric. The experimental data are 320 trajectory data of test.csv, and batch data is bach_size which is 5. The result is shown in Figure 10.
As shown in Figure 10, the vertical axis represents the prediction time of each batch_size, and the horizontal axis represents the four algorithms. According to the experimental results, we can see that the Baseline algorithm is the most time-consuming, because the entire experimental data set needs to be traversed when matching the query trajectory with the historical trajectory. 2-Markov and Subsyn have relatively little prediction time, because the time and space complexity of the sample data during processing is lower. The prediction time of EP-LSTM algorithm is lower than that of Baseline algorithm, and is higher than that of Subsyn and Markov. This is explained by the fact that the training of the neural network requires a lot of matrix operations and the time complexity is relatively high. From a comprehensive point of view, the prediction effect of EP-LSTM algorithm is better. Without neglecting the movement law of moving objects, the data sparsity problem of destination prediction is effectively addressed.

CONCLUSION AND FUTURE WORK
Destination prediction of moving objects is an important part of LBS, which has attracted more and more attention due to its broad applications, such as traffic navigation, virus transmission, hotel recommendation, and personalized advertising.
However, there are always trajectory data sparsity and long-term dependence in this area. To address the above problems, we simplify the trajectory by introducing the method of trajectory segmentation IMDL. Then, the embedded vector is used to extract the feature vector. Finally, the data set is trained in the LSTM model to establish the EP-LSTM algorithm, which is used to reach destination prediction. While ensuring the availability of data, we improve the accuracy of destination prediction and effectively solve the adverse effects caused by data sparsity and long-term dependence. We report on a series of experiments with a real-world location history data set from the ECML/PKDD Discovery Challenge 2015 data set, showing that the EP-LSTM algorithm can achieve better prediction accuracy than the existing algorithms.
In terms of our future work, we plan to improve the time performance of the EP-LSTM algorithm. In addition, this algorithm does not consider the objective factors such as speed, road condition, and weather of moving objects. Therefore, we will analyze the factors mentioned above in the future work to improve the prediction accuracy of EP-LSTM.