Evaluation of mapmatching algorithms for smartphonebased active travel data
Abstract
Global Positioning System (GPS) data on walking and cycling trips can generate useful insights for transportation systems but require substantial processing. One of the key GPS data processing steps is “mapmatching”, or inference of the sequence of network links traversed during travel. The objective of this research is to evaluate the accuracy of existing mapmatching algorithms for GPS data on active travel. A method to flag erroneous mapmatching results without requiring groundtruth data and improvements for active travel data are also proposed. Six mapmatching algorithms are applied to a sample of 63 trajectories, stratified on network density and average heading change, extracted from a large set of realworld trips from metropolitan Vancouver, Canada. Results show that the best performing method is PgMapMatch, which can be further improved by adjustments to link costs and allowing wrongway travel. Two other algorithms have similarly accurate routes (70–90% accuracy, depending on the measure), but fail to generate routes for about a third of trips. The proposed error detection measure can be used (without ground truth data) to flag matched routes requiring visual inspection, with a recommendation to look for: Wrongway travel, missing links in the network data, and parallel facilities on the same street.
1 INTRODUCTION
Active travel (primarily walking and cycling) is essential to sustainable transportation systems. To encourage the use of active travel modes, transportation policies, programs, and plans should be informed by insights from realworld active travel behaviour. The increasingly widespread use of GPS tracking (for fitness applications, navigation systems, household travel surveys, and bikesharing systems, among others) presents the opportunity to collect an abundance of active travel data from a wide spectrum of travellers.
GPS data provide detailed but imperfect travel records, with location errors and missing observations being common. Substantial data processing is required to account for GPS data errors as well as infer additional features of the recorded trips [1]. The typical steps to process GPS data are (1) filtering, (2) trip identification, (3) inferring travel mode and purpose, and (4) matching GPS records to the underlying street network (mapmatching) [2]. The input to these successive processing steps is a GPS trajectory: a temporally ordered sequence of the traveller's location coordinates [3]. The basic data provided by GPS are timestamps and spatial coordinates (longitude, latitude, and sometimes altitude) for discrete observations. Additional information is sometimes provided from the GPS device (such as the number of satellites in view at the time of recording [4]) or from other sensors (such as accelerometry [5]).
The focus of this paper is mapmatching for active travel, a crucial step that enables the integration of GPS data with network information such as facility type, traffic volume, and road grade. Understanding mapmatching algorithm performance is vital because the mapmatching results will impact the successive analysis of travel behaviour [3]. Mismatched routes could, for example, lead to erroneous data on walking and cycling volumes or incorrect inference of traveller preferences from route choice models. Various methods for mapmatching have been used in past studies on walking and cycling, but there has been no rigorous comparison of their performance for active travel data. Thus, existing literature provides little justification for the selection of any one particular mapmatching algorithm, and almost no information about its transferability to other datasets (e.g. see [7, 8]. Mapmatching can be particularly challenging for GPS data from active travel because pedestrians and cyclists use travel paths that may not be represented in the street network data (such as shortcuts through parks). In addition, pedestrians and cyclists use a diverse combination of (sometimes parallel) facilities including sidewalks, multiuse paths, bicycle lanes, and mixedtraffic lanes, and network complexity is a key challenge for the mapmatching problem [6].
The objectives of this paper are to (1) evaluate the accuracy of existing mapmatching algorithms for smartphonederived active travel GPS data, (2) examine the potential to improve the bestperforming algorithm by tuning it for active travel, (3) propose an error detection method to flag problematic mapmatching results in future applications without requiring groundtruth data, and (4) examine sources of error in mapmatched routes. This study contributes to the literature by providing the first independent comparative evaluation of mapmatching algorithm performance for active travel data. External validation such as reported here is a crucial but oftenneglected step in the acquisition of (natural and social) scientific knowledge [9, 10]. We aim to provide specific, actionable information for analysts processing GPS data in research and practice to select an algorithm and evaluate the accuracy of mapmatching results. We also aim to inform researchers developing mapmatching algorithms by identifying common sources of error from existing algorithms and the characteristics of robust mapmatching for walking and cycling trips.
2 LITERATURE REVIEW
2.1 Past evaluations
Mapmatching has been an active area of research for many years, and review studies have intermittently summarized and catalogued the various approaches used [1113]. Mapmatching algorithms can be classified into geometric, topological, and advanced algorithms [1416]. Geometric approaches match GPS records based on the closest network features (node or link) or similarity of GPS trajectory shape to the network links (curvetocurve matching). Topological approaches can overcome some errors of geometric approaches by additionally considering the connectivity of the network. The third group of mapmatching algorithms employs advanced methods such as Hidden Markov Models in addition to geometric and topological approaches to generate the most likely route [17, 18].
Most papers proposing mapmatching algorithms include performance measures in their results (usually from crossvalidation), but external evaluations and comparisons of mapmatching performance are rare. A recent comparison of three mapmatching algorithms using GPS data from taxis in Beijing in addition to 100 trajectories from 100 areas around the globe identified GPS errors and network complexity as the key mapmatching challenges [6]. Another recent study on roadbased navigation systems using GPS records collected at speeds of 5 to 80 km/h concluded that advanced methods (particularly using processed data by Kalman filtering in a topological algorithm) perform better than geometric or topological algorithms [19]. Additional evidence for enhanced performance of advanced algorithms is provided by a study using cycling trajectories that compared a geometric algorithm to a Hidden Markov Modelbased advanced technique (with 90% correctly matched links by the advanced technique compared to 70% by the geometric algorithm, both adjusted to prefer cycling facilities) [17].
In general, external validations often yield poorer performance than crossvalidation reported by algorithm developers, due to overcalibration and hence poor transferability to new data sets collected in other contexts (modes, networks etc.) [2022]. Performance can also vary substantially by the measures used to represent algorithm accuracy. Some past studies only report if the mapmatching algorithm successfully generates a route for a given trajectory [23], while others use visual inspections [15]. When the groundtruth route is known, studies have evaluated accuracy using a binary variable indicating a complete reproduction of the groundtruth route [14], or a continuous variable indicating the proportion of the groundtruth route correctly reproduced [17]. In the absence of groundtruth routes, studies have compared the generated routes to the GPS trajectory data to evaluate the similarity of attributes such as distance travelled [24].
The literature indicates that advanced methods can generate more accurate routes compared to geometric and topological approaches alone. However, no study has systematically measured and compared the competency of advanced mapmatching algorithms for application to GPS data from active travel, across multiple dimensions of performance. Mapmatching for active travel is more challenging than for car trips as street network data can lack representation for paths taken by cyclists or pedestrians (for instance, shortcuts through parks [25]). Analysts working with walking and cycling GPS data currently lack evidence for the selection of a specific mapmatching algorithm. This study is motivated by the lack of independent comparative evaluations of existing mapmatching algorithms for active travel data.
2.2 Selected algorithms
To be included in this study, algorithms had to be published with a reproducible methodology or opensource script and be applicable to basic GPS data (timestamp, latitude, longitude) and network data (link and node locations, topology, and facility types). The most commonly used mapmatching methods from the literature were identified by reviewing studies published since 2010 that used cycling GPS data. That search was executed using the Google Scholar search engine in September 2019 and yielded four mapmatching algorithms frequently employed in past studies [16, 24, 26, 27]. We also include published algorithms with opensource scripts in Python or R programming languages; a GitHub search in 2019 yielded three more mapmatching algorithms with associated published studies [14, 28, 29].
The two algorithms most often used in cycling route choice studies [23, 30–36] were developed by Schuessler and Axhausen [16] and Dalumpines and Scott [26]. The Schuessler and Axhausen algorithm is an extension of Marchal et al. [37] which uses an advanced approach to determine the sequence of links travelled in car trips. In addition to geometric and topological approaches, the algorithm uses a Multiple Hypothesis Technique to evaluate a set of route candidates and choose the most likely route based on a scoring function that consists of the “relevant” perpendicular distance between GPS records and route links with a penalty term for GPS records with speed over the freeflow speed of the associated link. The authors reported 52–53% successfully matched routes. A study applying this algorithm to a cycling dataset reported 76% successfully matched routes after implementing filtering and mapmatching steps [35]. The Dalumpines and Scott algorithm is developed in ArcGIS and finds the shortest path between the first and last GPS record on a constrained network defined by a spatial buffer around the GPS trajectory records. The authors reported an accuracy of 88%; in other contexts, cycling studies that employed this algorithm reported success in generating a route for 54–93% of trips and 38–89% accuracy by different measures [25, 30, 31].
The Li [38] algorithm was developed for a cycling route choice study in Toronto. Similar to Dalumpines and Scott, ArcGIS is used for the main part of the algorithm. The algorithm assigns GPS records to links based on their proximity and uses voting based on neighbour records for the records that are close to intersections or located near parallel links. Li et al. [27] reported 63% of records were successfully mapmatched by applying the Li algorithm. The Schweizer et al. [24] algorithm was also designed for cycling trips and subsequently employed in a cycling route choice study [39]. The algorithm essentially uses the shortest path algorithm where the cost of links is determined by the presence of GPS records around the links as well as the presence of cycling facilities.
“PgMapMatch” is an algorithm with an opensource Python script developed by MillardBall et al. [14], that takes advantage of temporal information in GPS records in addition to using a combined geometric and topological approach. This algorithm gives a measure of mapmatching reliability (PgMapMatch score) which is the output from a calibrated logistic regression model. The authors report correct identification of 99% and 87% of used links in two testing datasets.
Perrine et al. [28] is an algorithm with an opensource Python script that finds the shortest path between the projection of GPS records onto links. The algorithm was applied to transit Generalized Transit Feed Specification data and probevehicle GPS data with two underlying networks: one used for dynamic traffic assignment and the other extracted from Open Street Map (OSM). The authors report an accuracy of over 99% for GPS data with the OSM network. Another algorithm with an opensource script developed by Camargo et al. [29] for truck GPS data utilizes the shortest path concept by giving links closer to GPS records at a lower cost. They also allow for the optional inclusion of increased costs for parallel links by using GPS heading information. This algorithm is not included in the evaluation below because it was developed for an older version of Python and is no longer executable.1
See Supplementary Material for further details on the structures of the six algorithms included in this study. All six algorithms have a geometric component but the designs differ. Schuessler and Axhausen, MillardBall et al., and Perrine et al. use the distance between GPS records and a candidate link while Dalumpines and Scott, Li, and Schweizer et al. use a fixed buffer size around network links or GPS trajectories. Route topology (connectivity of the mapmatched links) is considered in all algorithms except Li. Another distinction is that the Li algorithm is the only one to neglect link directionality, allowing wrongway travel. All the algorithms employ a shortestpath subalgorithm, either for gap treatment (Schuessler and Axhausen, Li, and MillardBall et al.) or in generating the mapmatched route itself (Dalumpines and Scott, Schweizer et al., and Perrine et al.). Schuessler and Axhausen and MillardBall et al. are the only algorithms to consider the length similarity of the mapmatched route to the GPS trajectory; Schuessler and Axhausen further compare the direction of the route to that of the GPS trajectory.
3 METHOD
Figure 1 illustrates the study methodology. Raw GPS data from realworld walking and cycling trips are filtered and cleaned and then strategically sampled to create an evaluation dataset of sample trajectories with groundtruth routes based on visual inspection. Routes are then generated for the sample trajectories by each of the mapmatching algorithms and evaluated using a set of 12 diverse evaluation measures (Objective 1). Modifications of the bestperforming algorithm (based on the evaluation results) are tested to enhance the application to active travel data (Objective 2). Next, an error indicator is developed from the evaluation measures that can be applied without groundtruth data to flag unreliable mapmatching results in future analyses (Objective 3). The error indicator is applied to mapmatched routes from the complete set of cleaned trajectory data and compared with contextual variables (GPS trajectory and network characteristics) to identify sources of mapmatching errors (Objective 4).
3.1 Data
The GPS travel data for this study come from an active travel survey conducted in 2017 in metropolitan Vancouver, Canada [40]. Participants were persons of at least 14 years of age who “typically cycle at least once a week”. Participants recorded one week of their active travel (walking, cycling, running etc.) using a smartphone application recording GPSbased locations at 1s intervals. The recording was manually started and stopped for each trip. Participants also indicated travel mode and trip purpose in the application. Only basic GPS data (timestamp and location) were recorded. Most participants used their personal smartphones to complete the survey; device details were not recorded. Recruitment and data collection occurred from June through October 2017.

Remove records with missing timestamp, latitude, or longitude,

Identify duplicate timestamps and if present, keep only the earliest one, consistent with Zhou et al. [41],

Reorder or remove reverse time sequences,

Remove location errors based on “position jumps” in which the travelled distance is greater than a buffer of 30 m and at a speed of at least 50 m/s, based on Schuessler and Axhausen [2],

Split trajectories at a time gap greater than 8 h, presuming that it represents an activity, consistent with Biljecki [42], and

Remove trajectories with fewer than 120 records, which are unlikely to represent real trips and typically not useful in travel analysis.
After filtering, a trip identification algorithm was applied to identify travel segments, based on an algorithm proposed by Tran et al. [43], which improved for active travel applications [20]. Finally, Support Vector Machine and C5.0 algorithms were applied to infer missing values of travel mode or trip purpose for 292 and 324 trips, respectively. This method achieved 95% and 87% accuracy for inferring mode and purpose, respectively, in a training dataset. The final dataset contains 2042 GPS trip trajectories recorded by 144 individuals, with 88% recorded on bicycles (or ebicycles) and 12% on foot. The majority of trips were for work or school commuting (53%), followed by leisure or exercise (20%), personal errands (18%), and other purposes (9%).
Street network data for metropolitan Vancouver were extracted from OSM by overpassID [44]. All available tags were added to the extracted map by OpenStreetMap toolbox in ArcGIS. The OSMnx package in Python was used to create the node layer, with network topology consistent with vertical relationships in the OSM “layer” tag [45]. To construct a unidirectional network from the bidirectional OSM network, the streets with the tag “oneway” of FALSE were selected and copied into another layer, the direction of the remaining links was reversed, and then the two layers were merged in ArcGIS. The unidirectional network consists of 449,647 links and 169,894 nodes. Land cover classification data were obtained from the Metro Vancouver open data catalogue2 with three classes: Builtup, bare, and vegetation (including tree cover).
With consideration of the computational time costs of the mapmatching algorithms (some of which were extremely long – see Section 4.3 below) and of manually determining the groundtruth sequence of links, a subset of the full GPS dataset was used for evaluation, with groundtruth routes identified by visual inspection. Trajectories from the full dataset were strategically sampled to obtain a representative combination of trips with respect to network density and average heading change. Trajectory link density was computed as the arithmetic means of the link density values for each record, based on a link density raster with a cell size of 250 m. The link density raster was generated in ArcGIS using the Line Density feature. The average heading change for each trajectory was calculated using the equations in Zhou et al. [41].
Trajectories in the middle two duration quartiles (8 to 30 min) were segmented into nine strata: Combinations of link density tertiles and heading change tertiles (with 100–156 trajectories in each stratum). Seven trajectories were then randomly selected from each stratum to generate the evaluation sample of 63 trip trajectories. Table 1 reports the statistical attributes of the sample trajectory set, which represents the full dataset well. These trips were each visually inspected by two researchers to identify the sequence of network links traversed in the trip (i.e. the groundtruth route) based on judgment. This approach uses around twice the sample size of past research assessing the performance of mapmatching algorithms with groundtruth data [46, 47].
Attribute  Evaluation sample (N = 63)  Full dataset (N = 2042) 

Travel mode  
Bike/ebike  92%  88% 
Walk/run  8%  12% 
Travel purpose  
Commute  52%  53% 
Errand/leisure  37%  35% 
Exercise  3%  3% 
Other  8%  9% 
Cumulative distance^{1} (km)  5.2 (2.6)  7.1 (7.4) 
Average speed (m/s)  16.9 (5.8)  16.9 (6.7) 
Missing records proportion^{2} (%)  15.7 (23.0)  16.1 (20.9) 
# of trajectories per individual  1.3 (0.6)  14.2 (7.9) 
 ^{1} Bottom four rows give mean (standard deviation).
 ^{2} $$ = [ {1  ( {\frac{{{\rm{\# \ of\ records}}}}{{{\rm{duration\ of\ trip}}}}} )} ]\ \cdot 100\% $$.
3.2 Evaluation measures
Evaluation analysis was completed in R, Python, and ArcGIS. Any premapmatching filtering steps in the original studies were excluded to use the consistent set of filtering steps described above. The Li algorithm does not consider link directionality, so accuracy is measured considering both directions of network links within its mapmatched routes. To be consistent with the other algorithms, for Perrine et al. we use the matched routes without the suggested manual corrections for poorly matched trajectories. As freeflow speed for links is determined only for car travel, we omit that term from the route scoring function in Schuessler and Axhausen (refer to Supporting Information for details).

Success rate compares the mapmatched route to the groundtruth route to see if they are identical, with a binary outcome [14].

Route mismatch fraction is the cumulative length erroneously subtracted or added to the groundtruth route by the mapmatching algorithm, divided by the length of the groundtruth route [18, 48]. See Figure 2 for an illustration.

The overlapping ratio is the ratio of the number of common links between the groundtruth and mapmatched routes to the union of all the links present in the two [46, 47]. See Figure 3 for an illustration.

Recall is the ratio of the cumulative length of common links between groundtruth and mapmatched routes to the length of the groundtruth route [49]. See Figure 4 for an illustration.

Precision is the ratio of the cumulative length of common links between groundtruth and mapmatched routes to the length of the mapmatched route [49]. See Figure 5 for an illustration.

Fmeasure is computed as $${\rm{2 \times }}\frac{{{\rm{precision}}\,{\rm{ \times }}\,{\rm{recall}}}}{{{\rm{precision}}\,{\rm{ + }}\,{\rm{recall}}}}$$ [50].
 7.
The length index is the ratio of the length of the mapmatched route to the cumulative distance between each consecutive pair of GPS records [24, 25, 48]. See Figure 6 for an illustration.
 8.
Average distance error per record is the average distance (in meters) between each GPS record and the mapmatched route [24]. To consider the order in the mapmatched sequence of links, two potential corresponding matched links are considered for each record: the link that the previous record was assigned to and the subsequent link in the sequence; the record is assigned to the closer link. If the assignment becomes “stuck” on a link (assigns the same intermediate link for all remaining GPS records), only GPS records with cumulative travel distance less than the link length are assigned to that link, and the assignment recommences from the next link in the sequence of mapmatched links. See Figure 7 for an illustration.
 9.
Dynamic Time Warping (DTW) is the shortest warping path (in meters) that aligns the GPS trajectory to the projected trajectory, computed as the sum of distances for each record (considering link order as above) [51].
 10.
Frechet distance is commonly defined as the shortest leash length that a person and their dog would need to traverse the GPS trajectory and matched sequence of links, respectively (considering link order and in meters, as with DTW) [51].
 11.
Alignment is the difference in bearing (in degrees) between the GPS trajectory and the mapmatched route over each 5record interval. The bearing of the matched route is calculated between the orthogonal projection of the first record in the interval onto the mapmatched route (considering link order as above) and the point found by continuing along the matched route over a distance equal to the cumulative distance between the 4 consecutive record pairs in the 5record interval. See the illustration in Figure 8.
3.3 Processing time
Processing times for the mapmatching algorithms are compared for example bicycle trip trajectory with a cumulative distance of 7.0 km and 1026 1s records (comparable to 7.1 km average cumulative distance in the full dataset). All algorithms were implemented on a 64bit Windows 10 desktop computer with 16.0 GB RAM, and generated at least one route for this trip. Caution should be exercised in interpreting results as algorithms are implemented via various software (R, Python, and ArcGIS) and scripting procedures might vary between the authors’ scripts and the opensource scripts. The steps required to prepare the input network (such as creating buffers around the network features in the Li and Schweizer et al. algorithms) are not included in the processing times, so as to represent the marginal (scaling) time cost.
3.4 Error detection
Since most GPS studies do not have groundtruth data, the use of groundtruthdependent evaluation measures is limited. To help assess the reliability of mapmatching results in the absence of groundtruth data, an error indicator is developed using the five evaluation measures above that do not rely on groundtruth data: length index, average distance error per record, DTW, Frechet distance, and alignment. Length index is transformed as the absolute value of length index minus 1 so that a reliable mapmatching can be represented by lower values of the error indicator. Factor analysis – a tool to reduce data variables into a parsimonious set of factors that capture the maximum amount of common variance – is used to derive the weights for each component of the error indicator [52]. The error indicator expression is obtained by the weighted sum score method [53]. A threshold for identifying questionable matched routes is derived from the error indicator values for the groundtruth routes.
The error indicator is applied to mapmatching results for the entire cleaned dataset of 2042 active travel trajectories. To find possible sources of mapmatching error, relationships are examined between the error indicator for the full dataset and GPS trajectory characteristics (heading change, trip duration etc.) and network characteristics (tree cover, network density etc.).
4 RESULTS
4.1 Accuracy
Table 2 gives the number of mapmatched routes returned per trajectory by each of the six algorithms, as well as the success rate in perfectly matching the groundtruth routes. All the algorithms generate at least one solution for each trajectory except for Schuessler and Axhausen and Dalumpines and Scott, which both have fairly high rates of failure to identify a route (29% and 38% of trajectories, respectively). This happens when Schuessler and Axhausen discard all candidate routes based on poor internal scoring, or Dalumpines and Scott fail to find a path through the constrained network (within a buffer of the trajectory locations). These rates are toward the high end of the range of past studies that failed to find routes for 7% to 46% of trajectories by these algorithms [15, 23, 25, 35].
Number of trajectories with:  

Algorithm  No mapmatched route  1 mapmatched route  2+ mapmatched routes  Success rate (perfectly matched routes) 
Schuessler and Axhausen  18  44  1  4 
Dalumpines and Scott  24  39  0  4 
Li  0  63  0  0 
Schweizer et al.  0  54  9  0 
MillardBall et al.  0  63  0  5 
Perrine et al.  0  63  0  0 
Both Schuessler and Axhausen and Schweizer et al. also return multiple matched routes for some trajectories, caused by equal scores assigned by the algorithms to the generated routes. For the rest of this Results section, average evaluation measure values are reported for the trajectories with multiple matched routes. Only half the algorithms generated the exact sequence of links in the groundtruth route for any trajectories: Schuessler and Axhausen, Dalumpines and Scott, and MillardBall et al., all with success rates under 8%. These perfectlymatched success rates are much lower than reported in some past studies using MillardBall et al. and Dalumpines and Scott, mostly for car trips [14, 26, 30, 31].
Figures 9 and 10 give the results of the other five evaluation measures based on groundtruth data. Lower route mismatch fractions indicate fewer mismatched links between mapmatched route and groundtruth route. Perrine et al. and Schweizer et al. have an average route mismatch fraction higher than 1 (2.73 and 1.21, respectively) which indicates the length of links included or excluded falsely in mapmatched routes exceeds the length of the groundtruth route. For other algorithms, the average route mismatch fraction is usually less than 0.5, but with substantial variation across trajectories (Figure 9).
Figure 10 confirms the same pattern observed in the routemismatch fraction. Higher values (closer to 1) for the evaluation measures presented in Figure 10 indicate betterperforming algorithms. Schuessler and Axhausen, Dalumpines and Scott, and MillardBall et al. have the highest overlapping ratios (averaging 0.69 to 0.76 across trajectories). Perrine et al. and Schweizer et al. have much lower overlapping ratios averaging 0.22 and 0.24; Li was in the middle with an average overlapping ratio of 0.48. This pattern generally holds for the other three evaluation measures, with the best values for Schuessler and Axhausen, followed closely by nearly identical values for Dalumpines and Scott and MillardBall et al., then somewhat poorer values for Li and much poorer values for Perrine et al. and Schweizer et al. Higher recall than precision values (i.e. Perrine et al.) indicates an abundance of falsely identified links in the mapmatched routes in addition to many of the true links, while the opposite (i.e. Li) indicates an overly conservative algorithm with fewer falselymatched links but a low portion of the true links identified. The other four algorithms have a balance of recall and precision.
It should be noted that Schuessler and Axhausen and Dalumpines and Scott only have evaluation measures for 71% and 62% of the trajectories, respectively, which may have been the easier trajectories to match. All algorithms except Perrine et al. yield better evaluation measures when applied only on the trajectories with at least one solution from all algorithms (49% of trajectories). However, the improvements are small (under 5%).
Figures 1114 give results for the five groundtruthindependent evaluation measures along with the same measures computed for groundtruth routes. Since travelled routes are rarely along network link centerlines, the measures for the groundtruth routes indicate the optimal values for a perfectly mapmatched route. Figure 11 performs in terms of length index, for which values closer to 1 are better. Consistent with the recall and precision values, Perrine et al. have high length indices, indicating the inclusion of many extraneous links, while Li has index values below 1, indicating overly conservative routes missing many links (but without extraneous ones). The other algorithms produce routes with length index values averaging 0.99 to 1.06 across trajectories. Betterperforming algorithms are expected to have lower average distance error (Figure 12). Dalumpines and Scott produce the lowest average distance error (mean of 49 m and standard deviation of 242 m), closest to groundtruth values (mean of 6 m and standard deviation of 11 m). The average distance error for the other algorithms is higher (with means ranging from 90 m for Schuessler and Axhausen to 745 m for Perrine et al.).
The similarity measures DTW and Frechet distance give additional measures of the aggregate distance between the GPS records and the matched network links, averaging 4663 and 21 m for the groundtruth routes, respectively (Figure 13). Algorithmproduced values for DTW and Frechet indicate the best performance for Dalumpines and Scott, Schuessler and Axhausen, and MillardBall et al. The alignment measure gives information about the orientation of the matched route compared to the GPS trajectory, which averaged 20 degrees difference for the groundtruth data (Figure 14). Dalumpines and Scott, MillardBall et al., and Schuessler and Axhausen were very close to this with average values of 20, 22, and 22 degrees, respectively. The other three algorithms averaged in the range of 24 to 33 degrees.
Five of the 63 sample trajectories were travelled on foot; those trips have a similar duration (averaging 17 min) to the cycling trips (averaging 18 min), but at lower average speeds. The evaluation measure results are mixed when comparing mapmatching performance for walking versus cycling trips. Differences in performance for the five groundtruthbased measures (route mismatch fraction, overlapping ratio, recall, precision, and Fmeasure) are all within 0.05 for MillardBall et al., Dalumpines and Scott, and Schuessler and Axhausen. The other three algorithms had more varying performance between the modes, generally better for walking than cycling.
4.2 Processing time
Table 3 gives processing times for the algorithms in seconds for the example bicycle trip (with 1026 1s records), as well as the processing times reported in original papers. Dalumpines and Scott and Li (mainly implemented in ArcGIS) have the shortest processing times (0.06 and 0.24 s/record, respectively), followed by MillardBall et al. (0.43 s/record) and then Perrine et al. (0.71 s/record). Schuessler and Axhausen and Schweizer et al. have much longer processing times (8.10 and 4.85 s/record, respectively). In addition to the complexity of Schuessler and Axhausen and Schweizer et al., these algorithms are scripted in R which can be slower than ArcGIS for certain spatial analysis tasks. Processing times for the example trajectory are shorter than originally reported in Dalumpines and Scott, but longer than originally reported for Schuessler and Axhausen, Schweizer et al., and MillardBall et al. These differences may be due to the size and complexity of the trajectory and network data, the algorithm coding in the scripts (for the nonopensource algorithms), and the operating machines and software.
Algorithm  Implementation software  Opensource script  Processing time (s/ trajectory)  Processing time (s/record)  Processing time (s/matched meter)  Reported processing time in the original paper 

Schuessler and Axhausen  R  No  8310  8.10  1.17  0.01–0.08 s/record 
Dalumpines and Scott  ArcGIS and R  No  61  0.06  0.01  6 s/record 
Li  ArcGIS and R  No  244  0.24  0.05  Not reported 
Schweizer et al.  ArcGIS and R  No  4979  4.85  0.70  0.002 s/matched meter 
MillardBall et al.  Python  Yes  445  0.43  0.06  14 s/trajectory 
Perrine et al.  Python  Yes  731  0.71  0.04  Not reported 
4.3 Evaluation summary
Considering all the measures, MillardBall et al. is the bestperforming algorithm overall; it generated a unique route for every trajectory and achieved the highest success rate. Compared to the other algorithms, MillardBall et al. mapmatched routes have on average: Low deviation from the groundtruth route (route mismatch fraction of 0.38), high coverage of the groundtruth route (overlapping ratio of 0.69), and high accuracy (Fmeasure of 0.82). The generated routes by MillardBall et al. also have lengths similar to GPS travelled distance (average length index of 1.04, versus a groundtruth value of 1.02). Two other algorithms (Schuessler and Axhausen and Dalumpines and Scott) had similar performance across most of the accuracy measures but crucially failed to generate routes for 29% and 38% of routes, respectively. The other three algorithms yielded routes with substantially poorer accuracy measures.
The results for MillardBall et al. were least strong with respect to the GPS trajectory shape and bearing (average distance error, similarity measures, and alignment results). Visual inspection indicates that the errors were often related to wrongway travel on oneway streets. Discarding the results of trajectories for which Dalumpines and Scott or Schuessler and Axhausen did not generate a solution (32 trajectories), MillardBall et al. produce better average distance error, DTW, Frechet distance, and Alignment values. The issue of wrongway travel is examined further below.
Comparing results across measures illustrates the importance of considering multiple dimensions of performance in accuracy assessments. When algorithms produce an incorrect sequence of links (as is the case the vast majority of the time), we want to know not just how much of the true route they have identified, but also whether they are including many false links, missing many used links, or departing widely from the travelled path.
4.4 Error indicator
The expression for the composite error indicator obtained by factor analysis is$$0.39\ LI + 0.94\ ADE + 0.67\ A + 0.95\ DTW + 0.74\ FD$$, combining the normalized values of transformed length index (LI), average distance error per record (ADE), alignment (A), dynamic time warping (DTW), and Frechet distance (FD) for each generated route by MillardBell et al. The indicator explained 59% of the variance in the trajectorylevel data with Kaiser–Meyer–Olkin (0.53) and Bartlett's test of sphericity (p < 0.01) confirming sampling adequacy and sufficient correlation between variables, respectively [52]. The error indicator values are moderately correlated to the “PgMapMatch score” for matched routes by MillardBall et al., with a correlation between the two of −0.37 (reliable mapmatching should have a high PgMapMatch score and low error indicator value, hence the negative correlation). Visual inspection reveals that the PgMapMatch score and the error indicator are misaligned where a traveller traversed a oneway street in the wrong direction.

Cyclists travelling in the wrong direction on a oneway street (which is not allowed in MillardBall et al. routes),

Travel on paths that are not represented in the network data (missing links), and

Matched routes on parallel street facilities.
Error indicator values were also calculated for the mapmatched routes of MillardBall et al. applied to the full set of 2042 GPS trajectories in the dataset. Error indicator values ranged from 0.0 to 13.4, averaging 0.5 with a standard deviation of 0.9; 19% of the matched routes are flagged as potentially unreliable (error indicator > 0.5). Table 4 gives Pearson correlation coefficients between the error indicator values and trajectory and network characteristics for each matched trip in the full dataset. Significantly higher error indicator values are associated with: longer trips, trips with speed outliers, and trips with more bridges and underpasses, oneway streets, and cycling facilities in the surrounding network. More oneway streets increase opportunities for wrongway travel, and more cycling facilities increase opportunities for mismatching to a parallel facility. Similar to these results, Chao et al. (6] also report mapmatching errors associated with trajectory outliers; in contrast, they report greater mapmatching errors in more complex networks whereas we found no significant association with link density.
Trajectory or network characteristic  Definition  Average (standard deviation)  Correlation coefficient with error indicator^{s} 

Length  Sum of travelled distance between consecutive pairs of GPS records (m)  7062 (7411)  0.32* 
Travel mode  Travel mode (bike = 1, other = 0)  Bike = 1789  0.05 
Average heading change  Average heading change between consecutive pairs of GPS records (deg)  13.2 (8.8)  <0.01 
Gap proportion  Seconds of missing data (time differences more than sampling interval) divided by the duration of the trajectory  0.16 (0.21)  <0.01 
Speed outlier  Difference between maximum and average speed (m/s) in the trajectory  11.9 (12.6)  0.09* 
Network density  Average link density for trip  4908 (368)  0.02 
Bridge/underpass proportion  Total length of overpass/underpass links (from OSM) intersecting (completely or partially) a 50m buffer around the trajectory divided by the total length of links intersecting the buffer  0.05 (0.07)  0.05* 
Tree canopy proportion  Total length of links with tree canopy land cover intersecting (completely or partially) a 50m buffer around the trajectory divided by the total length of links intersecting the buffer  0.26 (0.13)  0.03 
Oneway street proportion  Total length of oneway streets intersecting (completely or partially) a 50m buffer around the trajectory divided by the total length of links intersecting the buffer  0.11 (0.08)  0.06* 
Cycling facility proportion  Total length of cycling facilities (from OSM) intersecting (completely or partially) a 50m buffer around the trajectory divided by the total length of links intersecting the buffer  0.28 (0.12)  0.07* 
 ^{a} for travel mode (binary variable), average error indicator difference is reported (bike minus walk), with ttest significance.
 * significant at p < 0.05.
5 PROPOSED MODIFICATIONS
5.1 MillardBall et al. (PgMapMatch)

The link cost function is changed to be based on length instead of travel time since speed limits have little relevance for active travel speeds,

Wrongway travel on oneway streets (“salmoning”) is allowed instead of prohibited in routes, but penalized by a link cost multiplier of 1.5–6, based on Broach et al. [54], and

The link cost of cycling facilities is decreased by a factor of 0.1–0.8, based on cycling route choice studies [23, 34, 54, 55].
For the second and third modifications, specific parameter values are selected for each modification individually using the six groundtruthdependent measures. In addition, jointly optimal parameter values are identified using the same measures by applying the three modifications combined through a grid search of possible parameter values. The optimal combination was selected based on the magnitude of improvement for the majority of the measures.
Table 5 gives average groundtruth evaluation measures for the modified algorithm (recall and precision were similar to the Fmeasure). The modifications each improve algorithm performance modestly by all measures except success rate. The distancebased cost modification is the most impactful, followed by the cycling facility adjustment and then the salmoning penalty. But the improvements are not additive: the combined modification (all three) is only better than the distancebased cost improvement alone for the success rate measure. The jointly optimal modifications (distancebased cost, salmoning penalty factor of 1.5, and cycling facility factor of 0.8) yield a further enhancement of all evaluation measures except success rate.
Success rate  Overlapping ratio  Route mismatch fraction  Fmeasure  

Original  5  0.69  0.38  0.82 
Distancebased cost  5  0.78  0.19  0.91 
Salmoning penalty factor of 4.5  5  0.71  0.34  0.83 
Cycling facility factor of 0.1  7  0.76  0.23  0.89 
Combined modifications  6  0.79  0.18  0.91 
Jointly optimal modifications  5  0.81  0.14  0.93 
5.2 Other algorithms
Optimal modifications of all the algorithms are beyond the scope of this study, but an examination of the error sources suggests possible directions for improved performance on walking and cycling data. Schuessler and Axhausen fail to find solutions for 29% of trajectories, but visual inspection of the discarded candidate routes reveals that some of the routes were accurate. Thus, thresholds for the filtering step could be modified. Another common error occurs when the algorithm falsely determines that the end of the candidate link has been reached using distancebased heuristics. To address this issue, filtering and smoothing of the GPS records might enhance algorithm performance by reducing distanceinflating location noise. Finally, the prohibition on Uturns in the matched routes also appears to be a barrier to correct route identification in some cases and should be reconsidered.
Dalumpines and Scott also fail to find solutions for a large portion of the trajectories. Removing the restriction of oneway directionality reduces the failure rate from 38% to 10%. Alternatively, enlarging the trajectory buffer used to constrain the network would allow the algorithm to find a path around oneway streets (although the identified path would be erroneous if the traveller had in fact used a wrongway link). Dalumpines and Scott can also fail to generate a correct solution when the trip origin and destination are located near to each other, which is the case for loop/exercise trips. To address this, Kam et al. [25] proposed adding three intermediate stops to overcome the issue of loop trips.
For Li, link assignment for some records is based on voting among links assigned to the ten preceding and ten succeeding records. This fails for short links and for records adjacent to missing data in longer time gaps. The algorithm performs worse and is more likely to create a discontinuous route, in dense parts of a network. Schweizer et al. generated erroneous routes far from the trajectory because of an overemphasis on using cycling facilities, particularly when there were parallel lowtraffic local streets. This algorithm could be enhanced by reexamination or recalibration of the link cost function, possibly also including other variables in the function such as road grade. Perrine et al. generated the least accurate routes, substantially poorer than reported in the original paper, possibly because it was developed for higherspeed vehicles on a more sparse network. The algorithm might also rely heavily on the suggested manual corrections after the initial route assignment, which were not implemented here.
6 CONCLUSION
In this evaluation of mapmatching algorithms using realworld active travel data from metropolitan Vancouver, Canada, the PgMapMatch algorithm from MillardBall et al. [14] proved to be the most reliable and to have a reasonably low processing time. Most of the performance measures for matched routes by Schuessler and Axhausen and Dalumpines and Scott were similar, but these algorithms failed to generate matched routes for around a third of trajectories (inflating aggregate performance measures for the remaining routes). The other three algorithms yielded routes with substantially poorer performance measures.
These bestperforming algorithms were not developed specifically for active travel data. Our results suggest that PgMapMatch could be further improved for active travel by using distance instead of travel time to generate link costs, lowering link costs for offstreet facilities, and allowing but penalizing wrongway travel. Adjustments to Schuessler and Axhausen (filtering thresholds for candidate routes and smoothing and filtering for GPS errors) and Dalumpines and Scott (allowing wrongway travel and widening search buffer for candidate links) could allow for more trajectories to be matched. Whether these modifications would also improve performance in other networks and for other active travel modes (scooters, skating etc.) is unknown and must be investigated in future work.
The bestperforming algorithms (with 70–90% accuracy, depending on the measure) are still imperfect (at least 92% of matched routes had at least one wrong link by all algorithms), so manual postprocessing (error detection and correction) is still necessary for reliable map matching. For this purpose, we propose a composite error indicator that can be calculated from the trajectory and network data alone (i.e. without groundtruth data), along with a threshold for flagging potentially inaccurate routes needing manual inspection. When inspecting, our results show that key sources of error to look for are missing links used by travellers but absent in the network data, wrongway travel (a form of missing links), and routes mismatched to parallel facilities on the same street used by the traveller. Errors are also more common around bridges and underpasses. Speed outliers also lead to errors, so better preprocessing for speed could reduce mapmatching errors. Missing links where paths are observed in GPS trajectories can be added to the network to improve mapmatching results for other trips as well.
6.1 Limitations and future research
The key strength of this analysis is external, comparative validation of multiple algorithms using independent, realworld GPS data. The inherent limitation is that algorithm performance on other data sets (from other regions and collected using different methods) may vary. Our results suggest that network characteristics (e.g. presence of bridges and underpasses) and GPS trajectory characteristics (e.g. presence of speed outliers) influence mapmatching errors. Mapmatching algorithm performance may also depend on preprocessing steps, which should be investigated in future work.
A related issue for future research is the effect of potential gap treatment methods for missing GPS data on mapmatching algorithm performance. Alleviating the influence of noise through spatial smoothing should also be investigated as a possible preprocessing step to improve mapmatching performance. Another direction for future work is automated methods for identifying missing links in network data by mapping large datasets of GPS trajectories onto the street network.
Mapmatching is a crucial step in GPS travel data analysis and one for which we still do not have a reliable method for all contexts or datasets. The findings in this paper help with specific guidance for analysts working with active travel datasets, and with suggested directions for future improvement of mapmatching algorithms for active travel. A final suggestion is for full reporting of all preprocessing and mapmatching methods used in future studies employing GPS data to advance understanding of the stateofpractice and to enhance reproducibility and reliability.
ACKNOWLEDGMENTS
This research was enabled by support from Social Science and Humanities Research Council of Canada (SSHRC) Insight Development Grant #430201900049, and the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant #RGPIN201604034.
CONFLICT OF INTEREST
The authors declare that they have no conflicts of interest.
Open Research
DATA AVAILABILITY STATEMENT
Study data are not available for sharing to protect participant privacy, per guidance from the University of British Columbia Behavioural Research Ethics Board (UBC BREB number: H1700294).