Evaluation of small unmanned aerial system highway volume and speed‐sensing applications

Funding information Massachusetts Department of Transportation, Grant/Award Number: INTF00X02018A0103206 Abstract Small unmanned aerial systems (sUAS) have been utilised in the transportation industry in recent years to decrease the cost of projects and tasks while increasing safety. This is due to their ability to capture aerial images with reduced effort and time. Recently, these devices have begun to be used for traffic monitoring, given their ability to capture video above a roadway. Combined with object-tracking techniques, vehicle data such as speeds, volumes, and trajectories could be extracted, providing an opportunity to revolutionize traffic data collection techniques. There exists a need to improve upon origin–destination volume and speed data collection procedures through the development of a low-cost methodology to capture detailed data that would allow for more accurate analysis. This study evaluates a methodology and measures the accuracy of volume and speed data collected through sUAS aerial imagery using object tracking techniques. Using the developed methodology, vehicle volumes were tracked at 93% accuracy, and vehicle speeds were recorded with a 6.6% relative error. While future improvements could be made on this methodology as technology advances, this study reveals a low-cost solution to collect vehicle data which could improve the efficiency of transportation studies, and in turn, improve safety.


INTRODUCTION
Small unmanned aerial systems (sUAS) have been utilised in the transportation industry in recent years to decrease cost and increase safety [1]. This new lightweight, low-cost technology is portable and applicable for many different tasks, including bridge inspections, 3D mapping, and crash reconstruction [1]. These devices are able to collect detailed information and capture aerial images with generally minimal effort and time. In recent years, sUAS has begun to be appreciated for applications in traffic monitoring [1][2][3][4][5]. Their ability to capture video above a roadway can be combined with object-tracking techniques to track vehicles and extract vehicle data such as speed, volume, and trajectory data [6], providing an opportunity to revolutionize traffic data collection techniques. Previously, aerial studies of highway vehicle speeds were infeasible due to the high cost of helicopters, and those conducted at ground level could only capture speed data at individual locations along roadways. As such, there exists a need to improve upon speed collection procedures by developing a low-cost methodology to capture This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2020 The Authors. IET Intelligent Transport Systems published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology vehicle speeds that would allow for more detailed and accurate analysis in the transportation planning process. Further, this data would allow for studies related to driver behaviour to be completed and understood at specific locations [7]. This study develops a methodology to capture vehicle volumes and speeds through aerial imagery collected via sUAS and measures the resulting accuracy of the data within the context of the United States. While other countries have different transportation and government systems, this new form of traffic data collection could be applied in transportation planning studies throughout the globe.
within the United States, aerial image processing, and sUAS applications, including traffic monitoring.

Speed limit setting
Traditionally, speed limits on newly constructed roadways are established from the design speed of the roadway segment, and many have remained unchanged since they were set during original construction. Thus, they may no longer be appropriate for the current conditions. Speed limit modification studies are induced in different ways, including through town or city officials receiving complaints from the public or through an investigation of crash history.

Speed limit selection process
State and local governments are responsible for setting speed limits in the United States [8]. The National Cooperative Highway Research Program Report 500 states that a speed limit should depend on design speed, vehicle operating speed, safety experience, and enforcement experience [9]. As many design factors are based upon anticipated use, a design speed does not always match the operating speed of a roadway [10,11]. Vehicle operating speed is considered from a range of 85th percentile speeds taken from various spot-speed surveys of free-flowing vehicles at specific points on a roadway segment. This 85th percentile speed is widely recognised as the most used analytical method for selecting the posted speed limit as it captures 85% of drivers' speeds [9,11].

Point speed capture limitation in the speed setting process
Traditionally, speed data collection methods have utilised point speed capture with continuous speed data considered impractical to collect [12]. Point speed capture devices, such as RADAR, LiDAR, and pneumatic tubes, can each only collect speed data at a specific point. Further, using these devices can be expensive and time-consuming over a corridor if detailed data is needed. Ideally, data along a road section would be collected continuously using a single device. This type of data collection could provide new opportunities in speed limit setting to increase safety more efficiently. Today, smartphone apps and GPS devices have the ability to capture this data; however, this method is not entirely limited to free-flow speeds, as there is missing data on the time headway between vehicles [13,14].

Aerial image processing
To collect more detailed information at a specific location, mounted video cameras can be placed to record the roadway. These devices are used in conjunction with video image processor systems to detect vehicles as well as specific data, such as speed. This technology has been understood and utilised for several years [15,16]. Processors analyse successive video frames to extract this data using algorithms and object tracking [15]. Object tracking in the video is often separated into three distinct areas: target representation, target detection/recognition, and target tracking [6].

Target representation
To accurately detect and track an object of interest, the system must understand what it is looking for under differing conditions. This can be accomplished through feature extraction, which plays a critical role in tracking [6,17]. Feature point detection provides more stable detection of objects [18]. Common visual features utilised in these algorithms include colour, object boundaries, optical flow, and texture [17]. Using these features, the goal of target representation is to create a model of the object of interest. This model includes the object's appearance, size, and shape, along with some prominent features obtained from the feature extraction process in each image, which ignores the unhelpful background in the image. It can be created by extracting features on several thousand images of the object from a specific vantage point, or it can be selected either automatically online or by a user in the video sequence [6]. While different feature extraction methods exist, such as area-based clustering of features, point-based is considered to be more meaningful to understand patterns in more detail, such as collecting vehicle-specific information [19].

Target recognition
Target recognition is the process of detecting the target object in a specific scene. To accurately detect an object, the model created in the target representation procedure must be used, and search metrics and matching criteria must be defined by its features in a video frame [6]. This search criterion separates the background from the object in a single video frame. This is done on every frame, with each considered one at a time. Some high-level detectors utilize spatial information between several frames to detect an object, which reduces the number of misclassifications [6].

Target tracking
Target tracking is the process of estimating the location of a particular target over a period of time [6]. Multiple types of trackers exist, including point trackers, kernel trackers, and silhouette trackers [17]. Kernel trackers, in particular, are commonly used to calculate the motion between frames. These trackers rely on an object's appearance and shape. Point trackers, by comparison, track objects between neighbouring frames described by defined points. For this type of tracker, a detection method must be applied to extract the points in each frame [6].

Potential issues with target tracking with unmanned aerial systems
The majority of non-stationary video tracking devices, including unmanned aerial vehicles (UAV), are based on the assumption of a flat world, and that egomotion is translational [20]. Fortunately, since UAV record video at relatively low heights (often less than 121 m (400 ft) above the ground, due to FAA Part 107 regulations), the assumption of a flat world holds in almost all cases. The second assumption that egomotion is translational is true when minimal rotation occurs between frames at the feature level [20]. Egomotion is defined as "any environmental displacement of the observer" [21]. In the case of a UAV in the sky, it is the assumption that the UAV will not rotate when recording video to use in the tracking process, so the movement of a feature can be classified as purely translational. Each of these assumptions simplifies the processing necessary to stabilize UAV video, as well as show the importance of stabilizing the video while it is being captured [20]. This can be minimised by flying UAV only in adequate weather conditions. Further, most UAV hold a gimbal for video stabilization. This allows for the rotation of the camera about a single axis, reducing movement while hovering.

State-of-the-art in image processing
Image processing and object detection methods can be accomplished using a variety of developed detection methods [22]. One specific approach, You only look once (YOLO), has proven to be a leading framework for detection, persistently having strong detection accuracy and speed [22]. One-stage detectors such as YOLO are generally faster than two-stage detectors, as they avoid pre-processing algorithms and perform prediction with fewer candidate regions. YOLO performs the whole detection method in a single network, allowing for the process to optimised [23,24]. This approach also learns very general representations of objects, allowing it to outperform other widely used detection methods, such as deformable part model (DPM) and R-CNN [23]. Details of YOLO and other object detection methods can be found in the previously published literature [22][23][24]. For any object tracking over a period of time, estimation must be made in conjunction with the detection approach. For vehicle tracking, the Kalman filter has been used as an optimal estimator that infers the parameters of interest from indirect, inaccurate, and uncertain observations through predicting and updating [25,26]. This estimation method has been successful in recent studies of vehicle detection from static cameras [25][26][27]. Details of the Kalman filter and other filter types can be found in the previously published literature [26,28].

2.3
Unmanned aerial system applications sUAS has historically been used for military applications. However, with the commercialization and reduction in cost and size in recent years, the potential uses for these devices have grown. sUAS is comprised of three components: (1) the aircraft, or UAV; (2) communication and control; and (3) the pilot on the ground. There are two types of UAV: fixed-wing and multirotor. Fixed-wing UAVs have an airplane-like design, generating lift from air passing underneath, and multirotor UAVs have several rotors with propellers that push air downwards [29]. While the fixed-wing design allows for longer flight time, the aircraft must always be moving forward at a certain minimum speed to generate enough lift to stay in the air. Thus, it is not possible to keep the UAV hovering at a single location of interest. Further, fixed-wing aircraft require a large, open location for take-off and landing [29]. Given these disadvantages, which do not affect multirotor aircraft, the multirotor aircraft is recommended to be used for data collection, especially for speed data collection. sUAS applications have been explored for many uses, including for structural inspection, topographic surveying and mapping, crash reconstruction, and traffic monitoring [30].

Traffic monitoring
In recent years, sUAS has been introduced to the transportation community as a cost-effective solution to collect trajectory data from the sky and replace the old approach of using pre-installed static cameras. Discussed by Barmpouakis et al. (2016), sUAS offer higher reusability and energy efficiency than static cameras for traffic data collection [5].
Most multi-rotor sUAS have the flexibility to collect large amounts of aerial data almost anywhere in a matter of minutes. These devices can be programmed to automatically fly a particular route to collect specific aerial imagery, creating simplicity in the flying process for the pilot. Their small size is also beneficial to collect naturalistic data of a roadway segment, allowing for a generally non-intrusive way of collecting traffic data. However, a noteworthy limitation of multi-rotor sUAS is their small battery capacities, which only allow them to fly for short periods of time, often for only 20 to 30 min [3,31]. Given sUAS have the ability to hover approximately above a highway and collect the speeds of many vehicles at once, it is easily possible to collect more traffic data during that short time than traditional methods, such as a roadside LiDAR. At the same time, it is noted that for sites with notably low volumes, data collection via sUAS may not provide significant benefits over the traditional roadside manual collection. Overall, it is possible that for many roadway situations, the short battery life of a multi-rotor sUAS may not cause any issues. Extra sUAS batteries may also be carried if more than one deployment is necessary.
According to FAA regulations Part 107, sUAS can only be operated in adequate wind and weather conditions, causing limitations to their use. However, fair weather conditions are often necessary to collect accurate data via sUAS, given that wind and other weather conditions can cause the attached camera to shake. Finally, it is noted that sUAS must be operated by a Part 107 certificate holder if flown for non-recreational uses. This certificate requires a written exam to be passed, which costs $150 and must be retaken every two years to maintain certification.
Studies have been completed using sUAS for traffic surveillance, as well as roadway incident monitoring [3,4]. When utilizing sUAS for traffic monitoring, it is important to consider data collection accuracy. The most basic parameter is the number of pixels in a video of the recorded area; as pixels increase, accuracy increases at a given altitude of recording [2]. Another basic parameter affecting the number of pixels is the camera resolution. As of June 2020, UAV camera costs from popular commercial brands have a wide cost range and include cameras priced at approximately $60 with 1920 × 1080 pixel resolution to cameras priced at approximately $2050 with a 5280 × 2972 pixel resolution.  studies demonstrates the necessity of this study. Overall, it is clear that there is a strong need for an improved methodology to collect volume data and operating speed data, and that sUAS combined with aerial image processing offers a promising solution to meet those needs.

METHODS
With the priority of understanding the accuracy of vehicle detection, regardless of speed, a vehicle tracking accuracy analysis was first conducted, followed by a speed-specific analysis from sUAS video. The following sections describe the methods of these two analyses. In both analyses, a Phantom 3 Pro UAV was used. This UAV's camera has a FOV of 94 • , 1920 × 1080 pixel resolution, and was flown at varying heights below the maximum of 400 ft above ground level (AGL). An outline of the processing steps developed and employed during this research are presented in Figure 1. Each step of this process is described in more detail throughout this section.

Vehicle tracking
Multiple intersections were selected for initial data collection to determine which location would be most appropriate for further analyses. These locations were located in western Massachusetts and included the roundabout intersecting North Pleasant Street, Eastman Lane, and Governors Drive on the University of Massachusetts Amherst campus, the roundabout in downtown Amherst intersecting East Pleasant Street and Triangle Street, and the double roundabouts at Atkins Corner in Amherst. Images taken by sUAS at each of these locations are shown in Figures 2-4. These sites were originally chosen for three primary reasons. First, there is only one lane of vehicle movement and the movement is generally a continuous motion throughout the image area. Second, there was a safe, authorised space for take-off. Third, there was a good line of sight to the detection area, such as no tree cover. It is noted that different and more difficult conditions could be considered in future studies; however, this Double roundabout at Atkins Corner study focused on researching the base condition for data collection. After trial flights were completed at each of the location options, the location at Atkins Corner in Amherst was chosen due to its complexity. It was considered a location that would benefit most from sUAS vehicle tracking as compared to traditional methods of vehicle counting.
The altitude of each flight was chosen based on the minimum height to capture the full intersection. For the chosen double roundabout intersection, the drone was flown at 400 ft AGL. As the area of the required study area increased, the required drone altitude increased; in turn, resolution decreased.
Data collection was completed during the winter season. To obtain the most realistic results, 12 videos averaging 7 min each were taken of the double roundabout during the morning peak hour from 7 AM to 9 AM, as this time set is most typically used for transportation planning. Each video was manually counted for vehicles to establish a ground truth by which the accuracy of the automated counting could be compared.
An automated vehicle tracking method was developed using computer vision in this study by employing two primary steps, which included a YOLO-based vehicle detection model and a Kalman filter-based vehicle tracking model. The first three steps of Figure 1 show the flowchart of the proposed method for this portion of the study.
Video pre-processing was the first step employed. The objective of this step was to effectively downsample the video frames and the image resolution for each frame. An effective downsampling strategy would preserve all the necessary features in the image and the frame rate for the subsequent detection and tracking steps, respectively, while minimizing processing time. It is noted that downsampling in space (reducing resolution) and in time (reducing frame rate) may impact the performance of detection and tracking, respectively. Unfortunately, there is no universal approach for identifying ideal resolution and frame rate, and they are often selected through trial-anderror experiments. Thus, the frame downsampling factor and the resolution downsampling factor were iteratively determined using a subset of the collected data by evaluating the corresponding detector rate and processing time. This was done to achieve the desired processing time without compromising the performance of detection and tracking before a full application of the algorithms were used on the entire dataset. From this process, the downsampling of the video data was based on a frame downsample by a factor of 15, that is resulting in 2 fps, and a resolution downsample by a factor of 2 in both xand y-directions, that is resulting 1920 × 1080.
The objective of the second step of vehicle detection was to identify the location of the vehicle appearing in each image frame. In this study, a deep learning framework called YOLOv3 [23,24] was employed for identifying the vehicles. The outcome of this method presented the bounding box of the detected vehicle in the image coordinate system. Figure 5 shows an example of detected vehicles.
Although YOLO has been widely applied in vehicle detection using pre-trained models, for example Microsoft Common Object in Context (COCO), very few models to date have been trained for UAV application, from which the captured vehicles show different outlooks and features. Therefore, the research team retrained a new model for the images captured from UAV based on an open vehicle dataset collected by Kharuzhy [32]. For the training using the UAV dataset [28], the authors selected the default YOLOv3 hyperparameters, including the learning rate (i.e. 0.001), batch size (i.e. 64), momentum (i.e. 0.9) and decay (i.e. 0.0005). The training time for each iteration of the training is less than 5.5 s for the entire training dataset with 156 images (4928 vehicle instances). Figure 6 shows the average loss (i.e. the training accuracy indication [24]) and the mean average precision (mAP) (i.e. the validation/testing accuracy indication [33]) during the training for 40,000 iterations on a single Nvidia GTX 1080Ti GPU. It should be noted that the performance of the network rapidly converged after about 5000 iterations with an average loss of less than 2.0, and an mAP stabilised at 80%. Therefore, less than 4.5 h of training time can be anticipated with a similar computer specification, given the small training dataset and the efficient YOLOv3 architecture. However, the authors did identify that by using a more aggressive learning rate (i.e. starting with a learning rate of 0.1 and step down by 1/10 for every 2500 iterations) would further reduce the training time by approximately 15-30%.
The authors consciously used a termination criterion of a validation mAP that is greater than 80% to achieve rapid training. The authors noticed that the errors made by the vehicle detector with an mAP of 80% could be compensated by the subsequent tracking algorithm, which does not necessarily require the vehicles of interest to be accurately detected in every frame. In the processing of detection, once the model was trained, the detection process required minimum manual intervention. However, the confidence level for the detected candidates needed to be "empirically" determined to optimize the receiver operating characteristic (ROC) curve, that is balancing the false positives and the false negatives.
The objective of the final step of vehicle tracking was to associate the detected location of the vehicles (i.e. detector) from consecutive frames into the same track so that the subsequent vehicle counting and speed computation became feasible. In this study, the detector was represented by the centroid of the bounding box from the second step. Figure 7 presents an example of a tracked vehicle superimposed on the captured frame. In this study, the Kalman filter [28] was employed to predict the motion of each detector. As vehicle tracking algorithms have been extensively investigated in previous research, and for the evaluation purpose of this study, the detailed formulation can be referred to in previously published literature [34].
Based on the closeness of the predicted location and the observed location (i.e. the detector from the following frames), the current detector was merged to the vehicle track or split into a new track [35]. Although the research team attempted to achieve the best detection rates in the second step, the vehicle tracking algorithm was able to correct a limited number of false positives and false negatives when these cases did not propagate in many consecutive frames. In this study, the tracking strategy allowed a trajectory removal if the detector appeared in less than five frames, and allowed a trajectory splitting if the detector no longer appeared in more than ten frames.

Speed
Route 9 in Amherst was chosen as the location of the speed experiment for four primary reasons. First, this roadway section had a low traffic volume and did not have any curves. Thus, for the speed measurement, the test vehicle would be able to drive at a consistent speed. Second, the lane widths and shoulder widths were able to be measured more easily than on a curve. This was needed to determine the distance represented by a single pixel. Third, there was a safe, authorised take-off space near the roadway. Fourth, there was direct line of sight to the roadway section. Overall, the speed study required a relatively simple environment, as the purpose of this study was to investigate a baseline case. An image of this location is shown in Figure 5.
The recorded portion of this experiment took place during the spring season.
To verify the accuracy of the speed data, probe drives were conducted simultaneously during video data collection. This was done by placing an "X" on the top of a vehicle and traversing the length of the roadway in the drone's view, while the drone flew at an altitude of 100 m (328 ft). The probe vehicle was driven at various speeds, and its speed was tracked through both its speedometer and a smartphone app. To maintain the highest degree of consistency between the drives, only one vehicle was used for this experiment.
The outcome of vehicle tracking is a trajectory that is represented in the image coordinate system. To compute vehicle speed, the real-world representation of the locations with geometrical information was required. Therefore, in this study, the research team extended the vehicle tracking flowchart with two additional steps for speed computation. Figure 1 presents the full flowchart for speed processing.
The objective of the fourth step of speed computation was to transform the image coordinate system to the world coordinate system so that the distance measured in the unit of the pixel could be translated in the unit of ft or miles. In this study, the research team developed a simple homography for transforming the coordinate pair, which is a 3 × 3 matrix. As the homography had eight degrees of freedom, at least four-point pairs were required for computing the homography matrix [36].
The objective of the final step of camera calibration is to compute the vehicle speed for all extracted vehicle trajectories. In this study, the research team first computed the speed based on the distance measured in the world coordinate system in consecutive frames and divided the time (i.e. frame rate) between those consecutive frames. As presented in the first step of video pre-processing, the original video was downsampled in frame rate by a factor of 15, accounting for the fact that the denominator of the speed computation in this step is 0.5 s. In any speed computation scenario, the accuracy of the vehicle localization (i.e. Steps 2 and 3) and the accuracy of the camera calibration (i.e. Step 4) may affect the accuracy of the speed computation. The accuracy of the camera calibration can be The accuracy for the detected/tracked centroids to reflect the vehicle's actual location are prone to the change of view angle, partial occlusion, or imperfect detection, which are inevitable. The slight shift of the centroids in consecutive frames may create the locational disturbance. In this study, the data collected by UAVs has the advantage of avoiding a drastic change of view angle and occlusions. Recognizing that the imperfect detection will persist in any vehicle detection and tracking algorithms can be minimised (i.e. Steps 2 and 3), the research team applied a simple median smoothing scheme on the derived speed with a window size of five, so that the randomly-occurred, imperfect detection (i.e. incorrect centroid) will be effectively disregarded. This is presented in the following section.

RESULTS
The following section describes the accuracy of the vehicle tracking and speed analyses.

Vehicle tracking
The Recall of each video was calculated as the true positives divided by the true positives plus the false negatives. In other words, the Recall was the number of vehicles that the software picked up, divided by the total number of vehicles that actually did pass through the video. Across all of the videos analysed, the Recall was 93%. All full table of the ground truth, true positives, and false positives is included in the Appendices. The Recall was separated by origin-destination. The labels for these are shown in Figure 8 and broken down into the following:  Table 1 presents the Recall for each origin-destination in the video. It is noted that N/A represents an origin-destination pair in the video where no vehicles were present. Each origin had three possible destination pairs, for a total of 12 pairs. The Uturn (e.g. BL-BL) origin-destination pairs were omitted as the traffic volume was near zero for each one. Precision, which are

FIGURE 8
Origin-destination labels the true positives divided by the total of the true positives plus the false positives, averaged 93% as well. This is presented in Table 2, where N/A again represents origin-destination pairs that did not have any vehicles present.

Impact of lighting on detection
In the early morning, specifically in videos captured from 7:00 AM to 7:20 AM, accuracy was lower with an average Recall of 83% than the rest of the morning with an average Recall of 94%. This was due to the poorer lighting in the early morning, as the sun was still rising. The imagery captured by the sUAS is presented in Figures 8  and 9 at 8:30 AM and 7:00 AM, respectively.
To overcome this issue with low-lighting, which interferes with detection, other detection methods could be explored, including using a thermal camera instead of a standard camera. These thermal cameras are already being used on sUAS in other industries [37] and advanced infrared usage for detection is becoming more understood [38]. This is an important barrier to overcome, as traffic data during the peak hours from 7 AM to 9 AM and 4 PM to 6 PM is typical, and these times coincide with darkness during the winter months.  Table 3.

FIGURE 10
Image of the roundabout from sUAS

Static camera comparison
At some locations, static video cameras may be available in areas where sUAS video data could be collected or could be installed on a nearby pole. In these situations, it may be unclear if sUAS video data is beneficial to use over the static camera data. To understand this difference, a static camera comparison to sUAS video data was completed of a roundabout on the University of Massachusetts Amherst campus. The UAV used in flight was the DJI Phantom 3 Standard, which has a FOV of 94 • . In all flights, the UAV was flown at 100 m (328 ft) AGL. The static camera was placed on the roof of a nearby four-story building and had a resolution of 1920 × 2080. Figure 10 presents the view of the sUAS footage of the roundabout, and Figure 11 presents the perspective of the static camera.
Data were collected during an AM peak hour from 8 AM to 9 AM, simultaneously using the sUAS and static camera during the fall season. Four individual sUAS flights were completed during this time period due to battery constraints. Due to this, during short periods of time throughout the span of the hour,

FIGURE 11
Image of the roundabout from the static camera data was not collected by sUAS. These data gaps could be overcome through the use of two sUAS to collect data or by extrapolating the current data to cover the time gaps.
To create a fair comparison, video data from the static camera was cut into the same periods as those collected via sUAS. Data from both videos were analysed using the same training model noted in the vehicle tracking process. To lay the ground truth of the origin-destination data, all data was counted manually. The final data comparison between the ground truth, static camera, and sUAS is presented Table 4 in terms of counts of calculated relative error.
Data from the static camera was found to be 2.3% more accurate than the sUAS using the same data processing.
As mentioned previously, the detector for the sUAS images was trained over a publicly available dataset with only 4928 vehicle instances. In comparison, the detector for the static camera image used a pre-trained model from the COCO dataset with 12,786 vehicle instances. Therefore, it is not surprising that the result shows a higher accuracy for the static camera. Further, it should be noted that the authors balanced the training time and the performance of the trained detector for the sUAS images (i.e. mAP > 80%).
As reported in Table 4, it also should be noted that the model (detection and tracking) for the sUAS images slightly overcounts vehicles across all directions. The authors noticed that the overcounting was attributed to imperfect trackings that caused splitting trajectories while all the vehicles were still correctly detected. These errors can be avoided by improving the tracking algorithms in this study. However, the model for the static camera images often undercounts vehicles in almost all directions, which implies that the tracking algorithm completely lost tracking of the vehicle due to the incorrect or missing vehicle detections.
Overall, given the constraints of the static camera, in areas where static cameras are not already present, using an sUAS for data collection could be a practical solution.

Speed
The accuracy of the automated drives compared to the probe drives is presented in Table 5. The actual speeds on the six probe drives over the approximate 450-ft section of roadway were checked through the speedometer throughout the drive and recorded on a smartphone app. All drives were completed by a single driver for consistency during a single time period midday. The relative error values ranged from 2.8% to 8.4%, with an average of 6.6% across all of the drives. This error was larger than anticipated and shows that further research must be completed to achieve better accuracy. The measured speed across the roadway section during its length is presented in Figure A1 in the Appendices. The light dots are individual measurements, the black dotted line is the best fit of the individual points, and the solid grey line is the ground truth speed. The x-axis represents the time passed by frame, and the y-axis represents vehicle speed (mph). Traditional methods of speed data collection on a short term basis are pneumatic tubes, LiDAR, or radar. Pneumatic tubes were found to have a speed percent error of 4.2% [39]. LiDAR and radar sensors have accuracies between −1 mph and +3 mph [40,41]. Comparing an average error of 6.6% from the probe drives to 4.2% from pneumatic tubes, it is clear that the new data collection method for speed may not yet be on par with traditional methods, but is also not that far off. Future testing and improvement are recommended to further study different methods to gain a better accuracy that is closer to the error, or better, than traditional methods.

DISCUSSION
Through all sUAS flights, minimal video distortion and vibration were experienced. The results of this vehicle tracking method using an sUAS are comparable with results from infrastructure-mounted LiDAR sensors at intersections. Research by Zhao et al. (2019) found that this technology has the ability to track vehicles above an intersection at an accuracy of 94.6% to 97.1% [42]. Overall, sUAS flights may be more simplistic and reasonable to collect the required data in some cases than mounting LiDAR, especially for locations that do not have easily accessible roadside poles to mount cameras on or areas that require a larger data detection range. Compared to other traditional methods, this sUAS method has the ability to be the solution to many data collection needs where short-term data collection is the primary goal.

CONCLUSIONS
This research explored the use of sUAS as a traffic data collection tool as compared to traditional volume and speed data collection instruments on roadways through a field study in the United States. Previous literature has found that sUAS are being utilised for survey work and are new devices used in the field of traffic data collection. Using the data collected in the field and a literature review, a methodology was developed to gather the accuracy. Two experimental studies were completed to compare the accuracy of vehicle tracking data and speed data collected via sUAS imagery and traditional methods. Using sUAS and video processing, the developed method was found to have an average trajectory count accuracy of 93%. Further, the speed data collected and analysed using our developed method had an average relative error of 6.6%. The speeds collected through our method were in the same range as the errors experienced through the use of LiDAR and radar sensors, which are traditionally used today. These sensors have a range of error from −1 to +3 mph [40,41]. Vehicle tracking using sUAS compared to a static camera was found to have a lower relative error by 2.3%. However, mounted static cameras are not always plausible to use for data collection. Even in areas where it is possible to mount a static camera on a pole, these devices may have problematic viewpoints with missing information and may be time-consuming and/or dangerous to place to gain a greater perspective of the given area. Therefore, in many cases, using an sUAS over a static camera even with the small relative error difference is practical.
Future work to further develop the use of sUAS for traffic monitoring could include turning movement counts, conflict-event studies, intersection delay measurements, parking utilization tracking, and queue studies. Further, an interesting extension of this study would be the development of models exploring the relationship between Recall of vehicle detection/vehicle count and speed measurement accuracy. These relationships will need evaluations on the performance of vehicle tracking (e.g. [43]) and details regarding how the detection performance impacts vehicle tracking. Finally, further research work could continue exploring the best vehicle tracking methods using sUAS to gain the most accurate results. These results may include volume and speed studies along with the development of a software platform to aid with the implementation of sUAS for traffic data collection.

FIGURE A1
Detected and ground truth probe drive speeds