Pixel-parallel feature detection on vision chip
Abstract
Pixel-parallel FAST and SIFT feature detection algorithms are proposed, which are suitable for the vision chip architecture. The proposed algorithms can execute the FAST and SIFT feature detections on the vision chip in a pixel-parallel manner. The Feature detection utilising the proposed algorithms can be realised by basic logic and arithmetic operations. Experiments and detailed analysis show that the proposed algorithms can perform feature detection on the vision chip at a higher speed compared with that on a general processor or an application-specific circuit.
Introduction
Feature detection in an image sequence is a prerequisite step for many image processing and computer vision algorithms such as object tracking and recognition. It labels local regions of a ‘feature’ of an image so that local appearance properties can be further exploited. The speed of feature detection is a key requirement for real-time or high-speed realisation of many algorithms. However, the performance of central processing unit (CPU)-based feature detection is inadequate to achieve high-speed processing. Recently, a massively parallel SOC called the vision chip [1-4] has been widely used in the field of high-speed image processing. It adopts a parallel SIMD processing elements (PEs) array to finish image processing algorithms in a pixel-parallel manner. However, current feature detection algorithms cannot be directly implemented on vision chips. In this Letter, we modify FAST and SIFT feature detections in a pixel-parallel way so that they can be implemented on a vision chip. Experiments and performance analysis have been carried out. The results indicate that our proposed methods can achieve higher performance than previous works.
Proposed FAST feature detection
The FAST [5] detector operates on a discretised circle around a candidate point p, as shown in Fig. 1a. The circle usually contains 16 pixels. p is classified as a feature pixel if there exists a contiguous arc of at least n pixels on the circle that are all brighter or all darker than p by a threshold t. The original FAST detector adopts a decision tree to improve performance on a CPU-based system. However, this method is not suitable for the vision chip with the pixel-parallel architecture, which is significantly different from the general-purpose CPU. Our proposed FAST-9 (FAST with n = 9) detector on the vision chip is shown in Fig. 1b.

FAST feature detector and proposed FAST-9 feature detection procedure on vision chip
a FAST feature detector
b Proposed FAST-9 feature detection procedure on vision chip



The above FAST implementation requires only simple logic and arithmetic operations and no flow control is needed. It is suitable for SIMD vision chip realisation. Since all PEs execute the same operations at the same time, the above operations only need to be carried out once to finish the FAST feature detection.
Proposed SIFT feature detection
SIFT [6] is widely accepted as one of the most popular and highest quality feature detectors currently available. It selects the local extrema of an image filtered with difference of Gaussians. The key step of SIFT feature detection is to obtain a Gaussian filtered image with different coefficients. As shown in Fig. 2a, the Gaussian images in the proposed method are generated by filtering the original image with a primary Gaussian filter of coefficient δ for 16 times. However, in the original SIFT algorithm, the Gaussian images are obtained by complex arithmetic calculation. The proposed method is based on the Gaussian cascading property [7], which indicates that filtering an image with Gaussian coefficient δ for n times is equal to filtering the image with Gaussian coefficient . Therefore, the five scales in an octave of the SIFT algorithm are acquired by taking the 1, 2, 4, 8 and 16 times filtered image. Gaussian filtering can be finished with a vision chip in a pixel-parallel way. This ensures high performance of the proposed procedure. Furthermore, as shown in Fig. 2b, if a Gaussian filter is separable, then the kernel (first row) of the filter can be used to filter the image both vertically and horizontally. The resultant image is equal to the Gaussian filtered image. Separating the two-dimensional (2D) Gaussian filter into two 1D kernels helps to reduce the computation overhead of calculating the Gaussian image and further improves performance. Three kernels corresponding to Gaussian filters with different σ values are carefully selected. The divided 8, 16 and 32 operations of the kernels are completed by ignoring the last 3, 4 and 5 bits of the results, respectively. The remaining procedure of the SIFT algorithm, namely, the difference of Gaussian images and non-maximal suppression can be easily carried out in a PE array of the vision chip.

Procedure of generating Gaussian images on vision chip
a Generate different Gaussian images with primary Gaussian filter
b Use kernels to implement primary Gaussian filters
Experimental results and evaluations
The proposed feature detectors are implemented on a field programmable gate array (FPGA). The FPGA runs at 50 MHz and a vision chip architecture that contains a 128 × 64 PE array is implemented. The PE array can carry out ‘bit-and’, ‘bit-inversion’, ‘bit-or’ and ‘bit-addition’ operations. Fig. 3 shows the operation breakdown of the proposed FAST and SIFT feature detections on the vision chip. The major parts of operations are ‘bit-addition’ and ‘bit-and’. Since every kernel of the proposed SIFT algorithm has to run for many times, it takes many more operations than FAST. The results of performing FAST-9 and SIFT on the FPGA are shown in Fig. 4. The FAST-9 detection result is almost identical with its MATLAB counterpart shown in Fig. 4d, since they are mathematically equivalent. The SIFT feature detected on the FPGA is slightly different from its personal computer counterpart due to the integer approximation of Gaussian filters. However, most of the feature points are detected and our experiments indicate that these differences with the original SIFT algorithm do not effect the results of the final recognition process.

Operation breakdown of proposed feature detections

Example and comparison of feature detection results
a Original image
b SIFT feature detected by FPGA
c FAST-9 feature detected by FPGA
d FAST-9 feature detected by MATLAB
This work | Moko et al. [8] | Clemons et al. [9] | Dohi et al. [10] | |
---|---|---|---|---|
Algorithm | FAST SIFT | FAST | FAST SIFT | FAST |
Implementation | FPGA | Embedded processor | Multicore processor | FPGA |
Frequency | 50 MHz | 168 MHz | 1 GHz | 25 MHz |
Resolution | 512 × 512 | 512 × 512 | 1024 × 768 | 640 × 480 |
Processing speed (fps) | FAST: 460 SIFT: 30 | 27.3 | ∼40 | 62.5 |
Table 1 shows a comparison with previously reported hardware implementations for feature detectors. Compared with [8, 10], our proposed FAST feature detection can process an image with approximately the same resolution and with much higher performance. When compared with [9], our experimented image resolution is three times smaller, but the processing speed of the FAST algorithm is more than 10 times faster. The SIFT implementation of [9] achieves a higher performance than our work; however, it runs at a twenty times faster clock frequency and its hardware is much more complex than ours.
Conclusion
We have introduced a pixel-parallel feature detection implementation for vision chips. The proposed feature detection can be finished on the vision chip with basic logic and arithmetic operations. Experimental results show that the pixel-parallel feature detection on the vision chip greatly improves feature detection performance.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (nos. 60976023 and 61234003) and the Special Funds for the Major State Basic Research Project of China (no. 2011CB93-2902).