Volume 17, Issue 1 p. 227-242
ORIGINAL RESEARCH
Open Access

Evaluation of map-matching algorithms for smartphone-based active travel data

Elmira Berjisian

Elmira Berjisian

Department of Civil Engineering, University of British Columbia, Vancouver, British Columbia, Canada

Search for more papers by this author
Alexander Bigazzi

Corresponding Author

Alexander Bigazzi

Department of Civil Engineering, University of British Columbia, 6250 Applied Science Lane, Vancouver, British Columbia, Canada

Correspondence

Department of Civil Engineering, University of British Columbia, 6250 Applied Science Lane, Vancouver, British Columbia V6T1Z4, Canada.

Email: [email protected]

Search for more papers by this author
First published: 27 August 2022
Citations: 1

Abstract

Global Positioning System (GPS) data on walking and cycling trips can generate useful insights for transportation systems but require substantial processing. One of the key GPS data processing steps is “map-matching”, or inference of the sequence of network links traversed during travel. The objective of this research is to evaluate the accuracy of existing map-matching algorithms for GPS data on active travel. A method to flag erroneous map-matching results without requiring ground-truth data and improvements for active travel data are also proposed. Six map-matching algorithms are applied to a sample of 63 trajectories, stratified on network density and average heading change, extracted from a large set of real-world trips from metropolitan Vancouver, Canada. Results show that the best performing method is PgMapMatch, which can be further improved by adjustments to link costs and allowing wrong-way travel. Two other algorithms have similarly accurate routes (70–90% accuracy, depending on the measure), but fail to generate routes for about a third of trips. The proposed error detection measure can be used (without ground truth data) to flag matched routes requiring visual inspection, with a recommendation to look for: Wrong-way travel, missing links in the network data, and parallel facilities on the same street.

1 INTRODUCTION

Active travel (primarily walking and cycling) is essential to sustainable transportation systems. To encourage the use of active travel modes, transportation policies, programs, and plans should be informed by insights from real-world active travel behaviour. The increasingly widespread use of GPS tracking (for fitness applications, navigation systems, household travel surveys, and bike-sharing systems, among others) presents the opportunity to collect an abundance of active travel data from a wide spectrum of travellers.

GPS data provide detailed but imperfect travel records, with location errors and missing observations being common. Substantial data processing is required to account for GPS data errors as well as infer additional features of the recorded trips [1]. The typical steps to process GPS data are (1) filtering, (2) trip identification, (3) inferring travel mode and purpose, and (4) matching GPS records to the underlying street network (map-matching) [2]. The input to these successive processing steps is a GPS trajectory: a temporally ordered sequence of the traveller's location coordinates [3]. The basic data provided by GPS are timestamps and spatial coordinates (longitude, latitude, and sometimes altitude) for discrete observations. Additional information is sometimes provided from the GPS device (such as the number of satellites in view at the time of recording [4]) or from other sensors (such as accelerometry [5]).

The focus of this paper is map-matching for active travel, a crucial step that enables the integration of GPS data with network information such as facility type, traffic volume, and road grade. Understanding map-matching algorithm performance is vital because the map-matching results will impact the successive analysis of travel behaviour [3]. Mismatched routes could, for example, lead to erroneous data on walking and cycling volumes or incorrect inference of traveller preferences from route choice models. Various methods for map-matching have been used in past studies on walking and cycling, but there has been no rigorous comparison of their performance for active travel data. Thus, existing literature provides little justification for the selection of any one particular map-matching algorithm, and almost no information about its transferability to other datasets (e.g. see [7, 8]. Map-matching can be particularly challenging for GPS data from active travel because pedestrians and cyclists use travel paths that may not be represented in the street network data (such as short-cuts through parks). In addition, pedestrians and cyclists use a diverse combination of (sometimes parallel) facilities including sidewalks, multi-use paths, bicycle lanes, and mixed-traffic lanes, and network complexity is a key challenge for the map-matching problem [6].

The objectives of this paper are to (1) evaluate the accuracy of existing map-matching algorithms for smartphone-derived active travel GPS data, (2) examine the potential to improve the best-performing algorithm by tuning it for active travel, (3) propose an error detection method to flag problematic map-matching results in future applications without requiring ground-truth data, and (4) examine sources of error in map-matched routes. This study contributes to the literature by providing the first independent comparative evaluation of map-matching algorithm performance for active travel data. External validation such as reported here is a crucial but often-neglected step in the acquisition of (natural and social) scientific knowledge [9, 10]. We aim to provide specific, actionable information for analysts processing GPS data in research and practice to select an algorithm and evaluate the accuracy of map-matching results. We also aim to inform researchers developing map-matching algorithms by identifying common sources of error from existing algorithms and the characteristics of robust map-matching for walking and cycling trips.

2 LITERATURE REVIEW

2.1 Past evaluations

Map-matching has been an active area of research for many years, and review studies have intermittently summarized and catalogued the various approaches used [11-13]. Map-matching algorithms can be classified into geometric, topological, and advanced algorithms [14-16]. Geometric approaches match GPS records based on the closest network features (node or link) or similarity of GPS trajectory shape to the network links (curve-to-curve matching). Topological approaches can overcome some errors of geometric approaches by additionally considering the connectivity of the network. The third group of map-matching algorithms employs advanced methods such as Hidden Markov Models in addition to geometric and topological approaches to generate the most likely route [17, 18].

Most papers proposing map-matching algorithms include performance measures in their results (usually from cross-validation), but external evaluations and comparisons of map-matching performance are rare. A recent comparison of three map-matching algorithms using GPS data from taxis in Beijing in addition to 100 trajectories from 100 areas around the globe identified GPS errors and network complexity as the key map-matching challenges [6]. Another recent study on road-based navigation systems using GPS records collected at speeds of 5 to 80 km/h concluded that advanced methods (particularly using processed data by Kalman filtering in a topological algorithm) perform better than geometric or topological algorithms [19]. Additional evidence for enhanced performance of advanced algorithms is provided by a study using cycling trajectories that compared a geometric algorithm to a Hidden Markov Model-based advanced technique (with 90% correctly matched links by the advanced technique compared to 70% by the geometric algorithm, both adjusted to prefer cycling facilities) [17].

In general, external validations often yield poorer performance than cross-validation reported by algorithm developers, due to over-calibration and hence poor transferability to new data sets collected in other contexts (modes, networks etc.) [20-22]. Performance can also vary substantially by the measures used to represent algorithm accuracy. Some past studies only report if the map-matching algorithm successfully generates a route for a given trajectory [23], while others use visual inspections [15]. When the ground-truth route is known, studies have evaluated accuracy using a binary variable indicating a complete reproduction of the ground-truth route [14], or a continuous variable indicating the proportion of the ground-truth route correctly reproduced [17]. In the absence of ground-truth routes, studies have compared the generated routes to the GPS trajectory data to evaluate the similarity of attributes such as distance travelled [24].

The literature indicates that advanced methods can generate more accurate routes compared to geometric and topological approaches alone. However, no study has systematically measured and compared the competency of advanced map-matching algorithms for application to GPS data from active travel, across multiple dimensions of performance. Map-matching for active travel is more challenging than for car trips as street network data can lack representation for paths taken by cyclists or pedestrians (for instance, shortcuts through parks [25]). Analysts working with walking and cycling GPS data currently lack evidence for the selection of a specific map-matching algorithm. This study is motivated by the lack of independent comparative evaluations of existing map-matching algorithms for active travel data.

2.2 Selected algorithms

To be included in this study, algorithms had to be published with a reproducible methodology or open-source script and be applicable to basic GPS data (timestamp, latitude, longitude) and network data (link and node locations, topology, and facility types). The most commonly used map-matching methods from the literature were identified by reviewing studies published since 2010 that used cycling GPS data. That search was executed using the Google Scholar search engine in September 2019 and yielded four map-matching algorithms frequently employed in past studies [16, 24, 26, 27]. We also include published algorithms with open-source scripts in Python or R programming languages; a GitHub search in 2019 yielded three more map-matching algorithms with associated published studies [14, 28, 29].

The two algorithms most often used in cycling route choice studies [23, 3036] were developed by Schuessler and Axhausen [16] and Dalumpines and Scott [26]. The Schuessler and Axhausen algorithm is an extension of Marchal et al. [37] which uses an advanced approach to determine the sequence of links travelled in car trips. In addition to geometric and topological approaches, the algorithm uses a Multiple Hypothesis Technique to evaluate a set of route candidates and choose the most likely route based on a scoring function that consists of the “relevant” perpendicular distance between GPS records and route links with a penalty term for GPS records with speed over the free-flow speed of the associated link. The authors reported 52–53% successfully matched routes. A study applying this algorithm to a cycling dataset reported 76% successfully matched routes after implementing filtering and map-matching steps [35]. The Dalumpines and Scott algorithm is developed in ArcGIS and finds the shortest path between the first and last GPS record on a constrained network defined by a spatial buffer around the GPS trajectory records. The authors reported an accuracy of 88%; in other contexts, cycling studies that employed this algorithm reported success in generating a route for 54–93% of trips and 38–89% accuracy by different measures [25, 30, 31].

The Li [38] algorithm was developed for a cycling route choice study in Toronto. Similar to Dalumpines and Scott, ArcGIS is used for the main part of the algorithm. The algorithm assigns GPS records to links based on their proximity and uses voting based on neighbour records for the records that are close to intersections or located near parallel links. Li et al. [27] reported 63% of records were successfully map-matched by applying the Li algorithm. The Schweizer et al. [24] algorithm was also designed for cycling trips and subsequently employed in a cycling route choice study [39]. The algorithm essentially uses the shortest path algorithm where the cost of links is determined by the presence of GPS records around the links as well as the presence of cycling facilities.

“PgMapMatch” is an algorithm with an open-source Python script developed by Millard-Ball et al. [14], that takes advantage of temporal information in GPS records in addition to using a combined geometric and topological approach. This algorithm gives a measure of map-matching reliability (PgMapMatch score) which is the output from a calibrated logistic regression model. The authors report correct identification of 99% and 87% of used links in two testing datasets.

Perrine et al. [28] is an algorithm with an open-source Python script that finds the shortest path between the projection of GPS records onto links. The algorithm was applied to transit Generalized Transit Feed Specification data and probe-vehicle GPS data with two underlying networks: one used for dynamic traffic assignment and the other extracted from Open Street Map (OSM). The authors report an accuracy of over 99% for GPS data with the OSM network. Another algorithm with an open-source script developed by Camargo et al. [29] for truck GPS data utilizes the shortest path concept by giving links closer to GPS records at a lower cost. They also allow for the optional inclusion of increased costs for parallel links by using GPS heading information. This algorithm is not included in the evaluation below because it was developed for an older version of Python and is no longer executable.1

See Supplementary Material for further details on the structures of the six algorithms included in this study. All six algorithms have a geometric component but the designs differ. Schuessler and Axhausen, Millard-Ball et al., and Perrine et al. use the distance between GPS records and a candidate link while Dalumpines and Scott, Li, and Schweizer et al. use a fixed buffer size around network links or GPS trajectories. Route topology (connectivity of the map-matched links) is considered in all algorithms except Li. Another distinction is that the Li algorithm is the only one to neglect link directionality, allowing wrong-way travel. All the algorithms employ a shortest-path sub-algorithm, either for gap treatment (Schuessler and Axhausen, Li, and Millard-Ball et al.) or in generating the map-matched route itself (Dalumpines and Scott, Schweizer et al., and Perrine et al.). Schuessler and Axhausen and Millard-Ball et al. are the only algorithms to consider the length similarity of the map-matched route to the GPS trajectory; Schuessler and Axhausen further compare the direction of the route to that of the GPS trajectory.

3 METHOD

Figure 1 illustrates the study methodology. Raw GPS data from real-world walking and cycling trips are filtered and cleaned and then strategically sampled to create an evaluation dataset of sample trajectories with ground-truth routes based on visual inspection. Routes are then generated for the sample trajectories by each of the map-matching algorithms and evaluated using a set of 12 diverse evaluation measures (Objective 1). Modifications of the best-performing algorithm (based on the evaluation results) are tested to enhance the application to active travel data (Objective 2). Next, an error indicator is developed from the evaluation measures that can be applied without ground-truth data to flag unreliable map-matching results in future analyses (Objective 3). The error indicator is applied to map-matched routes from the complete set of cleaned trajectory data and compared with contextual variables (GPS trajectory and network characteristics) to identify sources of map-matching errors (Objective 4).

Details are in the caption following the image
Study methodology

3.1 Data

The GPS travel data for this study come from an active travel survey conducted in 2017 in metropolitan Vancouver, Canada [40]. Participants were persons of at least 14 years of age who “typically cycle at least once a week”. Participants recorded one week of their active travel (walking, cycling, running etc.) using a smartphone application recording GPS-based locations at 1-s intervals. The recording was manually started and stopped for each trip. Participants also indicated travel mode and trip purpose in the application. Only basic GPS data (timestamp and location) were recorded. Most participants used their personal smartphones to complete the survey; device details were not recorded. Recruitment and data collection occurred from June through October 2017.

Transit segments were visually identified and removed along with trajectories outside the study area (metropolitan Vancouver). Then the following filtering rules were applied to the remaining raw GPS trajectories (see Supporting Information for details).
  1. Remove records with missing timestamp, latitude, or longitude,

  2. Identify duplicate timestamps and if present, keep only the earliest one, consistent with Zhou et al. [41],

  3. Reorder or remove reverse time sequences,

  4. Remove location errors based on “position jumps” in which the travelled distance is greater than a buffer of 30 m and at a speed of at least 50 m/s, based on Schuessler and Axhausen [2],

  5. Split trajectories at a time gap greater than 8 h, presuming that it represents an activity, consistent with Biljecki [42], and

  6. Remove trajectories with fewer than 120 records, which are unlikely to represent real trips and typically not useful in travel analysis.

After filtering, a trip identification algorithm was applied to identify travel segments, based on an algorithm proposed by Tran et al. [43], which improved for active travel applications [20]. Finally, Support Vector Machine and C5.0 algorithms were applied to infer missing values of travel mode or trip purpose for 292 and 324 trips, respectively. This method achieved 95% and 87% accuracy for inferring mode and purpose, respectively, in a training dataset. The final dataset contains 2042 GPS trip trajectories recorded by 144 individuals, with 88% recorded on bicycles (or e-bicycles) and 12% on foot. The majority of trips were for work or school commuting (53%), followed by leisure or exercise (20%), personal errands (18%), and other purposes (9%).

Street network data for metropolitan Vancouver were extracted from OSM by overpassID [44]. All available tags were added to the extracted map by OpenStreetMap toolbox in ArcGIS. The OSMnx package in Python was used to create the node layer, with network topology consistent with vertical relationships in the OSM “layer” tag [45]. To construct a unidirectional network from the bidirectional OSM network, the streets with the tag “oneway” of FALSE were selected and copied into another layer, the direction of the remaining links was reversed, and then the two layers were merged in ArcGIS. The unidirectional network consists of 449,647 links and 169,894 nodes. Land cover classification data were obtained from the Metro Vancouver open data catalogue2 with three classes: Built-up, bare, and vegetation (including tree cover).

With consideration of the computational time costs of the map-matching algorithms (some of which were extremely long – see Section 4.3 below) and of manually determining the ground-truth sequence of links, a subset of the full GPS dataset was used for evaluation, with ground-truth routes identified by visual inspection. Trajectories from the full dataset were strategically sampled to obtain a representative combination of trips with respect to network density and average heading change. Trajectory link density was computed as the arithmetic means of the link density values for each record, based on a link density raster with a cell size of 250 m. The link density raster was generated in ArcGIS using the Line Density feature. The average heading change for each trajectory was calculated using the equations in Zhou et al. [41].

Trajectories in the middle two duration quartiles (8 to 30 min) were segmented into nine strata: Combinations of link density tertiles and heading change tertiles (with 100–156 trajectories in each stratum). Seven trajectories were then randomly selected from each stratum to generate the evaluation sample of 63 trip trajectories. Table 1 reports the statistical attributes of the sample trajectory set, which represents the full dataset well. These trips were each visually inspected by two researchers to identify the sequence of network links traversed in the trip (i.e. the ground-truth route) based on judgment. This approach uses around twice the sample size of past research assessing the performance of map-matching algorithms with ground-truth data [46, 47].

TABLE 1. Descriptive statistics of the evaluation sample and the full dataset
Attribute Evaluation sample (N = 63) Full dataset (N = 2042)
Travel mode
Bike/e-bike 92% 88%
Walk/run 8% 12%
Travel purpose
Commute 52% 53%
Errand/leisure 37% 35%
Exercise 3% 3%
Other 8% 9%
Cumulative distance1 (km) 5.2 (2.6) 7.1 (7.4)
Average speed (m/s) 16.9 (5.8) 16.9 (6.7)
Missing records proportion2 (%) 15.7 (23.0) 16.1 (20.9)
# of trajectories per individual 1.3 (0.6) 14.2 (7.9)
  • 1 Bottom four rows give mean (standard deviation).
  • 2 = [ 1 ( # of records duration of trip ) ] · 100 % $ = [ {1 - ( {\frac{{{\rm{\# \ of\ records}}}}{{{\rm{duration\ of\ trip}}}}} )} ]\ \cdot 100\% $ .

3.2 Evaluation measures

Evaluation analysis was completed in R, Python, and ArcGIS. Any pre-map-matching filtering steps in the original studies were excluded to use the consistent set of filtering steps described above. The Li algorithm does not consider link directionality, so accuracy is measured considering both directions of network links within its map-matched routes. To be consistent with the other algorithms, for Perrine et al. we use the matched routes without the suggested manual corrections for poorly matched trajectories. As free-flow speed for links is determined only for car travel, we omit that term from the route scoring function in Schuessler and Axhausen (refer to Supporting Information for details).

Eleven evaluation measures are used to assess the performance of each algorithm, mostly based on evaluation approaches in the literature, but with a novel measure as well. The first six measures, based on the literature, compare a map-matched route to the ground-truth route.
  1. Success rate compares the map-matched route to the ground-truth route to see if they are identical, with a binary outcome [14].

  2. Route mismatch fraction is the cumulative length erroneously subtracted or added to the ground-truth route by the map-matching algorithm, divided by the length of the ground-truth route [18, 48]. See Figure 2 for an illustration.

  3. The overlapping ratio is the ratio of the number of common links between the ground-truth and map-matched routes to the union of all the links present in the two [46, 47]. See Figure 3 for an illustration.

  4. Recall is the ratio of the cumulative length of common links between ground-truth and map-matched routes to the length of the ground-truth route [49]. See Figure 4 for an illustration.

  5. Precision is the ratio of the cumulative length of common links between ground-truth and map-matched routes to the length of the map-matched route [49]. See Figure 5 for an illustration.

  6. F-measure is computed as 2 × precision × recall precision + recall ${\rm{2 \times }}\frac{{{\rm{precision}}\,{\rm{ \times }}\,{\rm{recall}}}}{{{\rm{precision}}\,{\rm{ + }}\,{\rm{recall}}}}$ [50].

Details are in the caption following the image
Illustration of route mismatch fraction, based on Newson and Krumm [18]; blue line denotes ground-truth route, red line denotes map-matched route, and green dots denote GPS records
Details are in the caption following the image
Illustration of overlapping ratio; blue line denotes ground-truth route; red line denotes map-matched route, and green dots denote GPS records
Details are in the caption following the image
Illustration of recall; blue line denotes ground-truth route; red line denotes map-matched route, and green dots denote GPS records
Details are in the caption following the image
Illustration of precision; blue line denotes ground-truth route; red line denotes map-matched route, and green dots denote GPS records
The next four measures from the literature can be used without ground-truth data (which would be the case for most GPS datasets).
  • 7.

    The length index is the ratio of the length of the map-matched route to the cumulative distance between each consecutive pair of GPS records [24, 25, 48]. See Figure 6 for an illustration.

  • 8.

    Average distance error per record is the average distance (in meters) between each GPS record and the map-matched route [24]. To consider the order in the map-matched sequence of links, two potential corresponding matched links are considered for each record: the link that the previous record was assigned to and the subsequent link in the sequence; the record is assigned to the closer link. If the assignment becomes “stuck” on a link (assigns the same intermediate link for all remaining GPS records), only GPS records with cumulative travel distance less than the link length are assigned to that link, and the assignment recommences from the next link in the sequence of map-matched links. See Figure 7 for an illustration.

  • 9.

    Dynamic Time Warping (DTW) is the shortest warping path (in meters) that aligns the GPS trajectory to the projected trajectory, computed as the sum of distances for each record (considering link order as above) [51].

  • 10.

    Frechet distance is commonly defined as the shortest leash length that a person and their dog would need to traverse the GPS trajectory and matched sequence of links, respectively (considering link order and in meters, as with DTW) [51].

Details are in the caption following the image
Illustration of length index; blue line denotes ground-truth route; red line denotes map-matched route, and green dots denote GPS records
Details are in the caption following the image
Illustration of average distance error per record; blue line denotes ground-truth route; red line denotes map-matched route, and green dots denote GPS records
The last measure is proposed to account for the common issue of geometric map-matching algorithms assigning records to cross-streets at intersections, illustrated in Figure 8, which is not well reflected in the other evaluation measures. The measure is based on the intuition that the matched route should have a similar shape and orientation to the GPS trajectory.
  • 11.

    Alignment is the difference in bearing (in degrees) between the GPS trajectory and the map-matched route over each 5-record interval. The bearing of the matched route is calculated between the orthogonal projection of the first record in the interval onto the map-matched route (considering link order as above) and the point found by continuing along the matched route over a distance equal to the cumulative distance between the 4 consecutive record pairs in the 5-record interval. See the illustration in Figure 8.

Details are in the caption following the image
Illustrations of (a) a GPS record erroneously matched to the cross-street at an intersection, and (b) the proposed alignment evaluation measure

3.3 Processing time

Processing times for the map-matching algorithms are compared for example bicycle trip trajectory with a cumulative distance of 7.0 km and 1026 1-s records (comparable to 7.1 km average cumulative distance in the full dataset). All algorithms were implemented on a 64-bit Windows 10 desktop computer with 16.0 GB RAM, and generated at least one route for this trip. Caution should be exercised in interpreting results as algorithms are implemented via various software (R, Python, and ArcGIS) and scripting procedures might vary between the authors’ scripts and the open-source scripts. The steps required to prepare the input network (such as creating buffers around the network features in the Li and Schweizer et al. algorithms) are not included in the processing times, so as to represent the marginal (scaling) time cost.

3.4 Error detection

Since most GPS studies do not have ground-truth data, the use of ground-truth-dependent evaluation measures is limited. To help assess the reliability of map-matching results in the absence of ground-truth data, an error indicator is developed using the five evaluation measures above that do not rely on ground-truth data: length index, average distance error per record, DTW, Frechet distance, and alignment. Length index is transformed as the absolute value of length index minus 1 so that a reliable map-matching can be represented by lower values of the error indicator. Factor analysis – a tool to reduce data variables into a parsimonious set of factors that capture the maximum amount of common variance – is used to derive the weights for each component of the error indicator [52]. The error indicator expression is obtained by the weighted sum score method [53]. A threshold for identifying questionable matched routes is derived from the error indicator values for the ground-truth routes.

The error indicator is applied to map-matching results for the entire cleaned dataset of 2042 active travel trajectories. To find possible sources of map-matching error, relationships are examined between the error indicator for the full dataset and GPS trajectory characteristics (heading change, trip duration etc.) and network characteristics (tree cover, network density etc.).

4 RESULTS

4.1 Accuracy

Table 2 gives the number of map-matched routes returned per trajectory by each of the six algorithms, as well as the success rate in perfectly matching the ground-truth routes. All the algorithms generate at least one solution for each trajectory except for Schuessler and Axhausen and Dalumpines and Scott, which both have fairly high rates of failure to identify a route (29% and 38% of trajectories, respectively). This happens when Schuessler and Axhausen discard all candidate routes based on poor internal scoring, or Dalumpines and Scott fail to find a path through the constrained network (within a buffer of the trajectory locations). These rates are toward the high end of the range of past studies that failed to find routes for 7% to 46% of trajectories by these algorithms [15, 23, 25, 35].

TABLE 2. Algorithm results for the number of routes and success rate in perfect route matching
Number of trajectories with:
Algorithm No map-matched route 1 map-matched route 2+ map-matched routes Success rate (perfectly matched routes)
Schuessler and Axhausen 18 44 1 4
Dalumpines and Scott 24 39 0 4
Li 0 63 0 0
Schweizer et al. 0 54 9 0
Millard-Ball et al. 0 63 0 5
Perrine et al. 0 63 0 0

Both Schuessler and Axhausen and Schweizer et al. also return multiple matched routes for some trajectories, caused by equal scores assigned by the algorithms to the generated routes. For the rest of this Results section, average evaluation measure values are reported for the trajectories with multiple matched routes. Only half the algorithms generated the exact sequence of links in the ground-truth route for any trajectories: Schuessler and Axhausen, Dalumpines and Scott, and Millard-Ball et al., all with success rates under 8%. These perfectly-matched success rates are much lower than reported in some past studies using Millard-Ball et al. and Dalumpines and Scott, mostly for car trips [14, 26, 30, 31].

Figures 9 and 10 give the results of the other five evaluation measures based on ground-truth data. Lower route mismatch fractions indicate fewer mismatched links between map-matched route and ground-truth route. Perrine et al. and Schweizer et al. have an average route mismatch fraction higher than 1 (2.73 and 1.21, respectively) which indicates the length of links included or excluded falsely in map-matched routes exceeds the length of the ground-truth route. For other algorithms, the average route mismatch fraction is usually less than 0.5, but with substantial variation across trajectories (Figure 9).

Details are in the caption following the image
Route mismatch fraction
Details are in the caption following the image
Overlapping ratio, recall, precision, and F-measure

Figure 10 confirms the same pattern observed in the route-mismatch fraction. Higher values (closer to 1) for the evaluation measures presented in Figure 10 indicate better-performing algorithms. Schuessler and Axhausen, Dalumpines and Scott, and Millard-Ball et al. have the highest overlapping ratios (averaging 0.69 to 0.76 across trajectories). Perrine et al. and Schweizer et al. have much lower overlapping ratios averaging 0.22 and 0.24; Li was in the middle with an average overlapping ratio of 0.48. This pattern generally holds for the other three evaluation measures, with the best values for Schuessler and Axhausen, followed closely by nearly identical values for Dalumpines and Scott and Millard-Ball et al., then somewhat poorer values for Li and much poorer values for Perrine et al. and Schweizer et al. Higher recall than precision values (i.e. Perrine et al.) indicates an abundance of falsely identified links in the map-matched routes in addition to many of the true links, while the opposite (i.e. Li) indicates an overly conservative algorithm with fewer falsely-matched links but a low portion of the true links identified. The other four algorithms have a balance of recall and precision.

It should be noted that Schuessler and Axhausen and Dalumpines and Scott only have evaluation measures for 71% and 62% of the trajectories, respectively, which may have been the easier trajectories to match. All algorithms except Perrine et al. yield better evaluation measures when applied only on the trajectories with at least one solution from all algorithms (49% of trajectories). However, the improvements are small (under 5%).

Figures 11-14 give results for the five ground-truth-independent evaluation measures along with the same measures computed for ground-truth routes. Since travelled routes are rarely along network link centerlines, the measures for the ground-truth routes indicate the optimal values for a perfectly map-matched route. Figure 11 performs in terms of length index, for which values closer to 1 are better. Consistent with the recall and precision values, Perrine et al. have high length indices, indicating the inclusion of many extraneous links, while Li has index values below 1, indicating overly conservative routes missing many links (but without extraneous ones). The other algorithms produce routes with length index values averaging 0.99 to 1.06 across trajectories. Better-performing algorithms are expected to have lower average distance error (Figure 12). Dalumpines and Scott produce the lowest average distance error (mean of 49 m and standard deviation of 242 m), closest to ground-truth values (mean of 6 m and standard deviation of 11 m). The average distance error for the other algorithms is higher (with means ranging from 90 m for Schuessler and Axhausen to 745 m for Perrine et al.).

Details are in the caption following the image
Length index evaluation measure results
Details are in the caption following the image
Average distance error evaluation measure results
Details are in the caption following the image
Dynamic time warping (DTW × 0.001) and Frechet distance evaluation measure results
Details are in the caption following the image
Alignment evaluation measure results

The similarity measures DTW and Frechet distance give additional measures of the aggregate distance between the GPS records and the matched network links, averaging 4663 and 21 m for the ground-truth routes, respectively (Figure 13). Algorithm-produced values for DTW and Frechet indicate the best performance for Dalumpines and Scott, Schuessler and Axhausen, and Millard-Ball et al. The alignment measure gives information about the orientation of the matched route compared to the GPS trajectory, which averaged 20 degrees difference for the ground-truth data (Figure 14). Dalumpines and Scott, Millard-Ball et al., and Schuessler and Axhausen were very close to this with average values of 20, 22, and 22 degrees, respectively. The other three algorithms averaged in the range of 24 to 33 degrees.

Five of the 63 sample trajectories were travelled on foot; those trips have a similar duration (averaging 17 min) to the cycling trips (averaging 18 min), but at lower average speeds. The evaluation measure results are mixed when comparing map-matching performance for walking versus cycling trips. Differences in performance for the five ground-truth-based measures (route mismatch fraction, overlapping ratio, recall, precision, and F-measure) are all within 0.05 for Millard-Ball et al., Dalumpines and Scott, and Schuessler and Axhausen. The other three algorithms had more varying performance between the modes, generally better for walking than cycling.

4.2 Processing time

Table 3 gives processing times for the algorithms in seconds for the example bicycle trip (with 1026 1-s records), as well as the processing times reported in original papers. Dalumpines and Scott and Li (mainly implemented in ArcGIS) have the shortest processing times (0.06 and 0.24 s/record, respectively), followed by Millard-Ball et al. (0.43 s/record) and then Perrine et al. (0.71 s/record). Schuessler and Axhausen and Schweizer et al. have much longer processing times (8.10 and 4.85 s/record, respectively). In addition to the complexity of Schuessler and Axhausen and Schweizer et al., these algorithms are scripted in R which can be slower than ArcGIS for certain spatial analysis tasks. Processing times for the example trajectory are shorter than originally reported in Dalumpines and Scott, but longer than originally reported for Schuessler and Axhausen, Schweizer et al., and Millard-Ball et al. These differences may be due to the size and complexity of the trajectory and network data, the algorithm coding in the scripts (for the non-open-source algorithms), and the operating machines and software.

TABLE 3. Processing times for map-matching algorithms
Algorithm Implementation software Open-source script Processing time (s/ trajectory) Processing time (s/record) Processing time (s/matched meter) Reported processing time in the original paper
Schuessler and Axhausen R No 8310 8.10 1.17 0.01–0.08 s/record
Dalumpines and Scott ArcGIS and R No 61 0.06 0.01 6 s/record
Li ArcGIS and R No 244 0.24 0.05 Not reported
Schweizer et al. ArcGIS and R No 4979 4.85 0.70 0.002 s/matched meter
Millard-Ball et al. Python Yes 445 0.43 0.06 14 s/trajectory
Perrine et al. Python Yes 731 0.71 0.04 Not reported

4.3 Evaluation summary

Considering all the measures, Millard-Ball et al. is the best-performing algorithm overall; it generated a unique route for every trajectory and achieved the highest success rate. Compared to the other algorithms, Millard-Ball et al. map-matched routes have on average: Low deviation from the ground-truth route (route mismatch fraction of 0.38), high coverage of the ground-truth route (overlapping ratio of 0.69), and high accuracy (F-measure of 0.82). The generated routes by Millard-Ball et al. also have lengths similar to GPS travelled distance (average length index of 1.04, versus a ground-truth value of 1.02). Two other algorithms (Schuessler and Axhausen and Dalumpines and Scott) had similar performance across most of the accuracy measures but crucially failed to generate routes for 29% and 38% of routes, respectively. The other three algorithms yielded routes with substantially poorer accuracy measures.

The results for Millard-Ball et al. were least strong with respect to the GPS trajectory shape and bearing (average distance error, similarity measures, and alignment results). Visual inspection indicates that the errors were often related to wrong-way travel on one-way streets. Discarding the results of trajectories for which Dalumpines and Scott or Schuessler and Axhausen did not generate a solution (32 trajectories), Millard-Ball et al. produce better average distance error, DTW, Frechet distance, and Alignment values. The issue of wrong-way travel is examined further below.

Comparing results across measures illustrates the importance of considering multiple dimensions of performance in accuracy assessments. When algorithms produce an incorrect sequence of links (as is the case the vast majority of the time), we want to know not just how much of the true route they have identified, but also whether they are including many false links, missing many used links, or departing widely from the travelled path.

4.4 Error indicator

The expression for the composite error indicator obtained by factor analysis is 0.39 L I + 0.94 A D E + 0.67 A + 0.95 D T W + 0.74 F D $0.39\ LI + 0.94\ ADE + 0.67\ A + 0.95\ DTW + 0.74\ FD$ , combining the normalized values of transformed length index (LI), average distance error per record (ADE), alignment (A), dynamic time warping (DTW), and Frechet distance (FD) for each generated route by Millard-Bell et al. The indicator explained 59% of the variance in the trajectory-level data with Kaiser–Meyer–Olkin (0.53) and Bartlett's test of sphericity (p < 0.01) confirming sampling adequacy and sufficient correlation between variables, respectively [52]. The error indicator values are moderately correlated to the “PgMapMatch score” for matched routes by Millard-Ball et al., with a correlation between the two of −0.37 (reliable map-matching should have a high PgMapMatch score and low error indicator value, hence the negative correlation). Visual inspection reveals that the PgMapMatch score and the error indicator are misaligned where a traveller traversed a one-way street in the wrong direction.

To determine a threshold for identifying potentially erroneous map-matching results, the error indicator was calculated for the ground-truth routes yielding a 99th-percentile value of 0.6, 95th-percentile value of 0.5, an average of 0.2, and standard deviation of 0.1. Based on this, we select an error indicator threshold value of 0.5, above which to flag map-matched trajectories as potentially unreliable, and require visual inspection. Applying this threshold flags 10 of the routes matched by Millard-Ball et al. The ground-truth-based evaluation measures for these routes (averaging 0.43 route mismatch fraction, 0.66 overlapping ratio, 0.82 recall, 0.78 precision, and 0.80 F-measure) are poorer than the other routes, showing that the error indicator (computed without ground-truth data) is aligned with the ground-truth based measures. Visual inspection of these routes and the GPS trajectories reveals that the most common sources of error are (see Figure 15 for illustrations):
  1. Cyclists travelling in the wrong direction on a one-way street (which is not allowed in Millard-Ball et al. routes),

  2. Travel on paths that are not represented in the network data (missing links), and

  3. Matched routes on parallel street facilities.

Details are in the caption following the image
Illustrations of map-matching errors caused by (a) one-way streets, (b) missing network links, and (c) parallel facilities; circles are GPS records, red lines are map-matched routes (with direction arrows), and black lines are network links

Error indicator values were also calculated for the map-matched routes of Millard-Ball et al. applied to the full set of 2042 GPS trajectories in the dataset. Error indicator values ranged from 0.0 to 13.4, averaging 0.5 with a standard deviation of 0.9; 19% of the matched routes are flagged as potentially unreliable (error indicator > 0.5). Table 4 gives Pearson correlation coefficients between the error indicator values and trajectory and network characteristics for each matched trip in the full dataset. Significantly higher error indicator values are associated with: longer trips, trips with speed outliers, and trips with more bridges and underpasses, one-way streets, and cycling facilities in the surrounding network. More one-way streets increase opportunities for wrong-way travel, and more cycling facilities increase opportunities for mismatching to a parallel facility. Similar to these results, Chao et al. (6] also report map-matching errors associated with trajectory outliers; in contrast, they report greater map-matching errors in more complex networks whereas we found no significant association with link density.

TABLE 4. Relationships between error indicator and trajectory and network characteristics for 2042 trips
Trajectory or network characteristic Definition Average (standard deviation) Correlation coefficient with error indicators
Length Sum of travelled distance between consecutive pairs of GPS records (m) 7062 (7411) 0.32*
Travel mode Travel mode (bike = 1, other = 0) Bike = 1789 0.05
Average heading change Average heading change between consecutive pairs of GPS records (deg) 13.2 (8.8) <0.01
Gap proportion Seconds of missing data (time differences more than sampling interval) divided by the duration of the trajectory 0.16 (0.21) <0.01
Speed outlier Difference between maximum and average speed (m/s) in the trajectory 11.9 (12.6) 0.09*
Network density Average link density for trip 4908 (368) 0.02
Bridge/underpass proportion Total length of overpass/underpass links (from OSM) intersecting (completely or partially) a 50-m buffer around the trajectory divided by the total length of links intersecting the buffer 0.05 (0.07) 0.05*
Tree canopy proportion Total length of links with tree canopy land cover intersecting (completely or partially) a 50-m buffer around the trajectory divided by the total length of links intersecting the buffer 0.26 (0.13) 0.03
One-way street proportion Total length of one-way streets intersecting (completely or partially) a 50-m buffer around the trajectory divided by the total length of links intersecting the buffer 0.11 (0.08) 0.06*
Cycling facility proportion Total length of cycling facilities (from OSM) intersecting (completely or partially) a 50-m buffer around the trajectory divided by the total length of links intersecting the buffer 0.28 (0.12) 0.07*
  • a for travel mode (binary variable), average error indicator difference is reported (bike minus walk), with t-test significance.
  • * significant at p < 0.05.

5 PROPOSED MODIFICATIONS

5.1 Millard-Ball et al. (PgMapMatch)

As described above, Millard-Bell et al. performed the best overall of the tested algorithms. However, it was not developed specifically for active travel data, so we investigated ways to enhance performance for application to walking and cycling trips. Based on the error sources identified above, three modifications are proposed:
  1. The link cost function is changed to be based on length instead of travel time since speed limits have little relevance for active travel speeds,

  2. Wrong-way travel on one-way streets (“salmoning”) is allowed instead of prohibited in routes, but penalized by a link cost multiplier of 1.5–6, based on Broach et al. [54], and

  3. The link cost of cycling facilities is decreased by a factor of 0.1–0.8, based on cycling route choice studies [23, 34, 54, 55].

For the second and third modifications, specific parameter values are selected for each modification individually using the six ground-truth-dependent measures. In addition, jointly optimal parameter values are identified using the same measures by applying the three modifications combined through a grid search of possible parameter values. The optimal combination was selected based on the magnitude of improvement for the majority of the measures.

Table 5 gives average ground-truth evaluation measures for the modified algorithm (recall and precision were similar to the F-measure). The modifications each improve algorithm performance modestly by all measures except success rate. The distance-based cost modification is the most impactful, followed by the cycling facility adjustment and then the salmoning penalty. But the improvements are not additive: the combined modification (all three) is only better than the distance-based cost improvement alone for the success rate measure. The jointly optimal modifications (distance-based cost, salmoning penalty factor of 1.5, and cycling facility factor of 0.8) yield a further enhancement of all evaluation measures except success rate.

TABLE 5. Average ground-truth evaluation measures for modifications to Millard-Ball et al
Success rate Overlapping ratio Route mismatch fraction F-measure
Original 5 0.69 0.38 0.82
Distance-based cost 5 0.78 0.19 0.91
Salmoning penalty factor of 4.5 5 0.71 0.34 0.83
Cycling facility factor of 0.1 7 0.76 0.23 0.89
Combined modifications 6 0.79 0.18 0.91
Jointly optimal modifications 5 0.81 0.14 0.93

5.2 Other algorithms

Optimal modifications of all the algorithms are beyond the scope of this study, but an examination of the error sources suggests possible directions for improved performance on walking and cycling data. Schuessler and Axhausen fail to find solutions for 29% of trajectories, but visual inspection of the discarded candidate routes reveals that some of the routes were accurate. Thus, thresholds for the filtering step could be modified. Another common error occurs when the algorithm falsely determines that the end of the candidate link has been reached using distance-based heuristics. To address this issue, filtering and smoothing of the GPS records might enhance algorithm performance by reducing distance-inflating location noise. Finally, the prohibition on U-turns in the matched routes also appears to be a barrier to correct route identification in some cases and should be reconsidered.

Dalumpines and Scott also fail to find solutions for a large portion of the trajectories. Removing the restriction of one-way directionality reduces the failure rate from 38% to 10%. Alternatively, enlarging the trajectory buffer used to constrain the network would allow the algorithm to find a path around one-way streets (although the identified path would be erroneous if the traveller had in fact used a wrong-way link). Dalumpines and Scott can also fail to generate a correct solution when the trip origin and destination are located near to each other, which is the case for loop/exercise trips. To address this, Kam et al. [25] proposed adding three intermediate stops to overcome the issue of loop trips.

For Li, link assignment for some records is based on voting among links assigned to the ten preceding and ten succeeding records. This fails for short links and for records adjacent to missing data in longer time gaps. The algorithm performs worse and is more likely to create a discontinuous route, in dense parts of a network. Schweizer et al. generated erroneous routes far from the trajectory because of an over-emphasis on using cycling facilities, particularly when there were parallel low-traffic local streets. This algorithm could be enhanced by re-examination or recalibration of the link cost function, possibly also including other variables in the function such as road grade. Perrine et al. generated the least accurate routes, substantially poorer than reported in the original paper, possibly because it was developed for higher-speed vehicles on a more sparse network. The algorithm might also rely heavily on the suggested manual corrections after the initial route assignment, which were not implemented here.

6 CONCLUSION

In this evaluation of map-matching algorithms using real-world active travel data from metropolitan Vancouver, Canada, the PgMapMatch algorithm from Millard-Ball et al. [14] proved to be the most reliable and to have a reasonably low processing time. Most of the performance measures for matched routes by Schuessler and Axhausen and Dalumpines and Scott were similar, but these algorithms failed to generate matched routes for around a third of trajectories (inflating aggregate performance measures for the remaining routes). The other three algorithms yielded routes with substantially poorer performance measures.

These best-performing algorithms were not developed specifically for active travel data. Our results suggest that PgMapMatch could be further improved for active travel by using distance instead of travel time to generate link costs, lowering link costs for off-street facilities, and allowing but penalizing wrong-way travel. Adjustments to Schuessler and Axhausen (filtering thresholds for candidate routes and smoothing and filtering for GPS errors) and Dalumpines and Scott (allowing wrong-way travel and widening search buffer for candidate links) could allow for more trajectories to be matched. Whether these modifications would also improve performance in other networks and for other active travel modes (scooters, skating etc.) is unknown and must be investigated in future work.

The best-performing algorithms (with 70–90% accuracy, depending on the measure) are still imperfect (at least 92% of matched routes had at least one wrong link by all algorithms), so manual post-processing (error detection and correction) is still necessary for reliable map matching. For this purpose, we propose a composite error indicator that can be calculated from the trajectory and network data alone (i.e. without ground-truth data), along with a threshold for flagging potentially inaccurate routes needing manual inspection. When inspecting, our results show that key sources of error to look for are missing links used by travellers but absent in the network data, wrong-way travel (a form of missing links), and routes mismatched to parallel facilities on the same street used by the traveller. Errors are also more common around bridges and underpasses. Speed outliers also lead to errors, so better pre-processing for speed could reduce map-matching errors. Missing links where paths are observed in GPS trajectories can be added to the network to improve map-matching results for other trips as well.

6.1 Limitations and future research

The key strength of this analysis is external, comparative validation of multiple algorithms using independent, real-world GPS data. The inherent limitation is that algorithm performance on other data sets (from other regions and collected using different methods) may vary. Our results suggest that network characteristics (e.g. presence of bridges and underpasses) and GPS trajectory characteristics (e.g. presence of speed outliers) influence map-matching errors. Map-matching algorithm performance may also depend on pre-processing steps, which should be investigated in future work.

A related issue for future research is the effect of potential gap treatment methods for missing GPS data on map-matching algorithm performance. Alleviating the influence of noise through spatial smoothing should also be investigated as a possible pre-processing step to improve map-matching performance. Another direction for future work is automated methods for identifying missing links in network data by mapping large datasets of GPS trajectories onto the street network.

Map-matching is a crucial step in GPS travel data analysis and one for which we still do not have a reliable method for all contexts or datasets. The findings in this paper help with specific guidance for analysts working with active travel datasets, and with suggested directions for future improvement of map-matching algorithms for active travel. A final suggestion is for full reporting of all pre-processing and map-matching methods used in future studies employing GPS data to advance understanding of the state-of-practice and to enhance reproducibility and reliability.

ACKNOWLEDGMENTS

This research was enabled by support from Social Science and Humanities Research Council of Canada (SSHRC) Insight Development Grant #430-2019-00049, and the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant #RGPIN-2016-04034.

    CONFLICT OF INTEREST

    The authors declare that they have no conflicts of interest.

    DATA AVAILABILITY STATEMENT

    Study data are not available for sharing to protect participant privacy, per guidance from the University of British Columbia Behavioural Research Ethics Board (UBC BREB number: H17-00294).