A pixel pair–based encoding pattern for stereo matching via an adaptively weighted cost

Stereo matching, which is a key problem in computer vision, faces the challenge of radiometric distortions. Most of the existing stereo matching methods are based on simple matching cost algorithms and appear the problem of mismatch under radiometric distortions. It is necessary to improve the robustness and accuracy of matching cost algorithms. A novel encoding pattern is proposed for stereo matching. In the proposed encoding pattern, each of the matching windows in the grey image and gradient images is divided into several isoline-like sets with different radii. Then, pixel pairs are deﬁned in the isoline-like sets. An encoding function is used to decide the relative order between the two pixels in each pixel pair. To apply the pattern for matching cost computation and enhance the matching accuracy, an adaptively weighted cost is designed that is related to the isoline-like sets. Experiments are conducted on the Middlebury and KITTI data sets to show the validity of the proposed method under severe radiometric distortions. Also, the comparisons with some widely used methods are made in the experiments to illustrate the advantage of the proposed method.


INTRODUCTION
Stereo matching aims to obtain stereo disparity. It is one of the most important and challenging topics in computer vision. The disparity map plays a key role in many applications, such as robotics, self-driving cars and 3D reconstruction. Although stereo matching has been studied for many years, accurate dense disparity map is still difficult to obtain because of the existence of occluded regions, non-texture regions, object boundaries, radiometric distortions, etc. Traditional stereo matching algorithms can be classified as local stereo algorithms and global stereo algorithms [1]. Local stereo algorithms [2][3][4] generally have the following four steps: matching cost computation, cost aggregation, disparity computation and disparity refinement. It only considers the local aggregation regions and the information of surrounding pixels are probably used. Global stereo algorithms [5] typically consist of matching cost computation, disparity optimisation and disparity refinement steps. It able for obtaining accurate disparity. The simplest matching cost such as Absolute Difference (AD), Square Differences (SD), and Sum of the Absolute Differences (SAD) [6] are based on the colour-consistent assumption that each pixel in the left image and its corresponding pixel in the right image have similar intensity value. However, in real scenario this assumption is not true because of the influence of radiometric distortions. In order to tackle the problem of radiometric distortions, Census transform [7] was introduced to establish the relative orders between the centre pixel and neighbouring pixels in the matching windows. But Census is not robust because it relies heavily on the intensity of centre pixel and ignores the relationship between neighbouring pixels themselves. Normalized Cross-correlation (NCC) [8], which is a window-based matching algorithm, performs well for dealing with Gaussian noise, but it tends to blur depth discontinuities. Based on the assumption of the Lambertian world, adaptive NCC (ANCC) [9] robustly hands various radiometric distortions. It performs well under the existence of lighting geometry and camera parameter changes. But it fails to obtain accurate disparity map while facing the severe illumination changes, large brightness differences, or a non-Lambertian world. MI [10] is based on the entropy of the probability distributions of the overlapping parts of stereo images. It can handle noise and complex radiometric distortions to some extent. However, it performs poorly in severely intensitydistorted stereo images. In Support Local Binary Pattern (SLBP) [11], the ordinal values are extracted and the image window is described as a bit string. It is designed to tackle the problem of radiometric distortions. However, SLBP ignores the intensity difference between pixels since the binary encoding function is used. Fuzzy encoding pattern (FEP) [12] divides the image window into disjoint neighbouring pixel sets. Then, it establishes relative orders between two pixels in every set. It uses a novel fuzzy function to represent the relationship between pixels and performs well under the existence of radiometric distortions and exposures. However, it requires the computation of a string of 408 integers for every pixel which limits its computational speed. Robust soft rank transform (RSRT) [13] establishes the relative orders in the same way as FEP. The difference between RSRT and FEP is the way of encoding. RSRT can be considered as a fast version of FEP. By the way, some works, such as [14], integrate the denoising process into the stereo matching to address the problem of the existences of multiple types of noise. However, they do not specifically focus on the situation where the image pairs are taken under different lighting or illumination conditions. The images of these methods are obtained under almost the same conditions, even though the existence of noises in the surrounding environment. Instead of these hand-crafted cost computation schemes, Zbontar [15] uses the convolutional neural networks (CNNs) to learn a more comprehensive cost measurement. Through comparing the similarity of a pair of patches and using the similarity score as the matching cost, [15] gets robust result on the Middlebury [16] and KITTI [17] stereo benchmark. [15] is used to compute matching costs by many stereo matching algorithms, such as [18][19][20][21]. Also, based on [17], many CNN-based stereo matching cost computation methods [22][23][24][25] are proposed by researchers. Luo [22] proposes a matching network with fast computation. The network outputs 64-dimensional features as the representation of each patch. [22] has better matching performance by using the inner product of features as the similarity between patches. Park [23] proposes a CNN network to get matching cost with a large-sized window. It can avoid the problem of fattening effect while extracting information from a large area. Yang and Lv [24] learn a Euclidean embedding for each image using CNN with a triplet-based loss function and a smoothness constraint. They generalise the learning result to SGM [10] and get comparable performance. In addition, Dong [26] combines SAD, GRAD, and CNN [15] methods to construct a new stereo matching cost volume. They get better performance than original stereo matching costs. CNN methods tend to enhance the weak textures and exhibit imaginary textures in occlusion areas, but they require much time to train network on the specific data sets. The hand-crafted cost computation methods do not need to train and could be directly applied to real scenarios.
Our proposed encoding pattern focus on the problem of radiometric distortions. Like other encoding patterns, such as Census and FEP, the matching windows are considered for original grey images in our proposed encoding pattern. The way of dividing the matching windows is inspired from traditional image feature extraction method: histogram of oriented gradients (HOG) [27]. HOG computes on uniformly spaced cells and maintains good invariance to both geometric and photometric deformations of the images. Radiometric distortions, which involve the deformations, could be avoided to some extent since the influences of distortions would be smaller in the small local areas. In FEP, the two non-adjacent pixels are considered. Then, the way of establishing the relative orders in FEP may be not robust enough. To obtain more robust results, a novel encoding pattern is proposed. Unlike other encoding patterns such as Census and FEP, the proposed encoding pattern focuses on the pixels which are neighbouring. On the other hand, a new encoding function is designed to obtain the accurate relative orders. For the relative orders obtained by the proposed encoding method, the stereo matching cost should be considered. Unlike the costs used in Census and FEP, the differences between the relative orders will be considered via some weights. As in [28], the gestalt grouping by proximity will be used to establish our adaptive weights in the cost. The main contributions of this paper are summarised as follows: i. A novel encoding pattern is proposed. It is consistent with HOG's methodology and maintains good invariance to the distortions since the distortions appear rarely in small area. Thus, it can improve robustness under the radiometric distortions.
ii. A discretisation encoding function, which is easy to tune the size of the interval, is proposed. Comparing with Census's encoding function, the proposed function considers the intensity difference between two pixels. Comparing with FEP's encoding function, the proposed function is more flexible.
(3) After encoding, in order to guarantee accuracy of stereo matching, an adaptively weighted cost related to the isolinelike sets is designed. It is consistent with the gestalt grouping by proximity.
The experiments will be given later to show the strength of the proposed method. Comparing with the traditional methods, it could be robust against the radiometric distortions and practical for real-world applications. Also, comparing with the CNNbased methods, our method can be used without training. In fact, selecting a specific and effective training data set for a neural network could be a difficulty in practice. The rest of the paper is organised as follows. Section 2 presents the proposed method in detail. In Section 3, comprehensive experiments on the Middlebury and KITTI data sets are made to verify that the proposed method can improve some previous method for stereo matching in the case of the radiometric distortions. Finally, we conclude our work in Section 4.

Pixel pair-based encoding pattern
In general, census cost [8] and its improvements focus on the relationship between centre pixel and neighbouring pixels in The red matching window in the image is filled with zeros to get the complete matching window the matching windows. These methods ignore the relationship between neighbouring pixels themselves and fail to get robust results. Our encoding pattern does not encode the relative orders between centre pixel and its neighbouring pixels. It is inspired by the traditional image feature extraction method HOG. It extracts the relative orders on small local areas. For every pixel in the original images, the matching windows centred at it will be constructed with radius R (an odd and positive integer). If the matching windows is uncompleted (e.g. the centres are near the borders), the rest pixels will be filled with zeros to make the windows complete. For example, for the pixels near the upper border of the image in Figure 1, zeros are filled to get the complete matching window. Then, in the matching window, pixels with the same radius are considered together. A matching window with radius R is divided into R isoline-like sets in which every pixel is with the same radius r (r ∈ {1, 2, … , R}). The centre pixel is expressed as p c and x c and y c are its coordinates. Then the r th isoline-like set centred at p c like FEP is expressed as where x and y are p's coordinates.
In the proposed encoding pattern, pixels in the matching window are indexed around the centre. For the matching window, the centre pixel is indexed as 0. The other pixels are indexed along the isoline-like sets. The pixels in S 1 (x c ,y c ) are indexed as 1-8. Similarly, the pixels in S 2 (x c ,y c ) are indexed as 9-24. These pixels are expressed as p 1 , p 2 , … , p 24 according to their indexes. For example, Figure 2 shows the division of the isoline-like sets in the 5 × 5 matching window. In Figure 2, S 1 (x c ,y c ) and S 2 (x c ,y c ) are shown in the mid image and right image, respectively. To be consistent with HOG's methodology, the small local areas are consisted of pixels indexed as 1-8 and 9-24, respectively

FIGURE 3
The example of the pixel pairs of S 1 (x c ,y c ) and S 2 (x c ,y c ) , respectively are considered in the proposed encoding pattern. The isolinelike sets are further divided into some local areas. It would be helpful for avoiding the distortions since the severe distortions would often appear on big spaced area. Since it is not easy to decide the size of the local area to avoid radiometric distortion, the proposed encoding pattern considers the local relationship extremely, that is, only two neighbouring pixels are considered. For each of the isoline-like sets, the two pixels with adjacent indexes are considered together. By the way, the indexes in the isoline-like sets are considered as rings. Then pixels indexed as 8 and 1 are also adjacent and p 8 and p 1 in the S 1 (x c ,y c ) are also considered together. We define two adjacent pixels as pixel pairs in the isoline-like sets. For example, p 1 and p 2 , p 2 and p 3 , … , p 8 and p 1 are the pixel pairs. The isoline-like sets are further divided into several pixel pairs. The pixel pairs of S 1 (x c ,y c ) and S 2 (x c ,y c ) are shown in Figure 3. They are indexed as 1-24. Our encoding pattern is strongly related to our defined pixel pair. Then we name it as pixel pair based encoding pattern (PPEP). A pixel pair could be considered as an extreme small local area. By the way, we considered that the design of our pixel pair could get more compact encoding information and reduce the complexity of total relative orders since a pixel pair only obtains one relative order.
For the pixel pairs, the relative order is encoded between the two pixels. The encoding function is defined as where sign(⋅) is the symbolic function and q and q ′ are the indexes of two pixels in the pixel pairs. ⌈⋅⌉ is the function of getting the smallest integer not less than itself. m is a scale factor, which is used to tune the interval of the function. In some extent, the function (2) can represent the information of intensity difference between two pixels. For example, by Census, the order is determined as 1 for the two pixels with intensity values 60 and 25, and the same order is determined for the two pixels with intensity values 28 and 25. Obviously, this ambiguity can be an obstacle. By encoding function (2), the ambiguity will be avoided. In addition, the proposed encoding function (2) does not have the limitations of upper and lower bounds, which is different from the fuzzy encoding function in FEP. When (2) is used to encode, the scale factor m, which means the size of the interval, is flexible to tune. For the isoline-like set S r (x c ,y c ) (r ∈ {1, 2, … , R}), the minimum and maximum indexes of the pixel pairs are expressed as MIN r and MAX r , respectively. The relative order between the two pixels in the k th (k ∈ {MIN r , MIN r + 1, … , MAX r }) pixel pair is expressed as k (x c ,y c ),I , which is an integer calculated by the function (2). The subscript I represents establishing the relative order in the original grey image.
In addition, the proposed method also considers gradient images in x direction and y direction. The image gradient information can reflect the change trend between local pixel values. It can bring the information of edge to some extent since the gradient values in the edges would be different from the gradient values in other areas. We consider that more robust and distinguishable relative orders could be established. Let S X and S Y be the 3 × 3 Sobel matrices for computing gradient images in x direction and y direction, respectively. Gradient images can be computed as and where * is the image convolution operator. Like the original grey image, the relative orders of gradient images are obtained and are expressed as k , respectively. Then, the relative orders are concatenated and expressed as where ⊗ is the concatenation operator. Then, every k (x c ,y c ) are concatenated. The total relative orders of the isoline-like sets with radius r in the original grey image and the gradient images are expressed as Take m = 15 and r = 1 as an example, in Figure 4, the relative orders of the isoline-like set 1 (x c ,y c ) in the original grey image and the gradient images are obtained, respectively. Then, the total 24 relative orders are concatenated according to the function (5) and (6). 1 (x c ,y c ) is consisted of 24 relative orders. Let 1 Every value of (x c ,y c ) is the relative orders of the corresponding isoline-like sets in the original image and the gradient images. Clearly, 1 (x c ,y c ) consists of 24 relative orders, 2 (x c ,y c ) consists of 48 relative orders and r (x c ,y c ) consists of 24r relative orders. Thus, (x c ,y c ) consists of 12r (1 + r ) relative orders. Take 9 × 9 matching window with R = 4 as an example, (x c ,y c ) consists of 240 relative orders.

Adaptively weighted cost
The relative orders are obtained from the different isolinelike sets with different radii. In order to get better stereo matching results, the influences of radii should be considered. An adaptively weighted cost for stereo matching is proposed.
Let I left and I right be the left and right images of a stereo image pair, d max be the maximum disparity and d be a disparity where d ∈ {0, 1, … , d max }. Let p ri be the corresponding pixel of p c in I right with a disparity d and x c − d and y c are its coordinates. Then, the adaptively weighted cost is defined as where r ∈ {1, 2, … R} and r (x c ,y c ),i and r (x c −d,y c ),i are the pixel p c and p ri 's values at index i in r (x c ,y c ) and r (x c −d,y c ) , respectively.
r (x c ,y c ) is a part of (x c ,y c ) . In (8), ,i | computes the dissimilarity between the relative orders of the pixel in the left image and its corresponding pixel in the right image. The adaptive weights exp( −r R ), which are related to the radii r of the isoline-like sets, are used to provide different contributions of the dissimilarities. The gestalt grouping by proximity [28] means that the point adjacent to the centre would get more attention in reality. The design of the adaptive weights follows the rule of the gestalt grouping by proximity.
The procedure of constructing costs C of our algorithm is showed in Algorithm 1.

EXPERIMENTS
Except the CNN-based methods, all of our experiments are made by a PC platform with an Intel core i5, 3.00-GHz CPU, and 8.00 GB of memory. The experiments of CNN-based methods are made by a GPU platform with GTX Titan X. For comparisons, the size of the matching windows is set as 9 × 9 in the experiments like FEP [12]. The methods are evaluated using the bad pixel error rate of each image in non-occluded areas. It is computed as Compute PPEP for each p c in I left using Equation (7) 2: Compute PPEP for each p ri in I right using Equation (7) 3: Compute the adaptive weights of the isoline-like sets' relative orders where I nocc is the set of all non-occluded pixels, |I nocc | is the number of pixels in I nocc and E (p c ) is defined as where D G (p c ) and D E (p c ) are the ground truth and estimated disparity at p c , respectively. The parameter is the error threshold, which is set as 1 and 4 for the Middlebury 2006 and 2014 data set [16], respectively, and 3 for the KITTI data set [17].

Data set selection
The Middlebury 2006, Middlebury 2014 and KITTI 2015 data set are used to evaluate the performance of the proposed method. Typically, the Middlebury 2006 data set, which consists of the indoor stereo images collected under controlled conditions with different illuminations and exposures, is used to establish image pairs with various radiometric conditions. These image pairs are used to investigate the performance of the proposed method under radiometric distortions. In details, the illuminations are indexed as 1, 2, and 3, and the exposures are indexed as 0, 1, and 2. In addition, the KITTI 2015 data set, which is consisted of the real road-driving images, is different from the indoor stereo images in the Middlebury data sets. It is necessary to evaluate the proposed method under the KITTI 2015 data set. Since the noise interference is existed in outdoor shooting, it should be not easy to get robust disparity results under the outdoor circumstances. To evaluate the proposed method, 200 stereo image pairs (indexed from 0 to 199) in training set in which the ground truth images were provided, are used for our experiments.

The methods selected in the experiments
Obviously, matching cost plays an important role in stereo matching. When the matching cost is decided, local and global algorithms can be used to obtain more accurate disparities. Local algorithms perform local aggregation and compute the disparities by winner-taken-all strategy. Global algorithms optimise the cost functions and directly get the disparities. In our experiments, the investigation of PPEP's parameter setting and adaptively weighted cost will not use the local algorithms and global algorithms. Otherwise, in order to make solid comparisons, it is necessary to conduct experiments that consist of the complete steps. The local algorithm and global algorithm will be used, respectively. In our experiments, a local algorithm named Guide-Filter (GF) [29] or a global algorithm named Graph-Cut (GC) [5] is combined with matching cost algorithms. GF and GC are the most widely used algorithms. When the matching cost algorithms are combined with GF or GC, the parameters for each method are fine-tuned to get the best results. By the way, all of the experimental results do not involve any postprocessing steps.

Investigations of PPEP's parameter setting and adaptively weighted cost
The Middlebury 2006 data set with various radiometric conditions mentioned above is used to investigate PPEP's parameter setting and adaptively weighted cost. In order to avoid the effects of local algorithms or global algorithms, PPEP does not combine any stereo matching algorithms and gets the disparity results directly in this section.
(1) Investigation of PPEP's parameter setting: The parameter m in (2) is the only parameter of PPEP. It affects the size of the interval of function (2). Experiments with different m settings are conducted. As shown in Table 1, PPEP with m = 25 gets the best results. The value m should not be too big or too small experimentally. m with small value could get small size of the interval. Then, intensity difference of two pixels may be represented more accurate. However, very small value of m can lead the method to be influenced by noise easily since slight changes of matching pixels' intensities can let the relative orders be different values. Then, m with big value can let the function be robust to noise, but it may affect the accuracy. The value of m can balance the accuracy and robustness. Note that in the subsequent experiments, m is set as 25.
(2) Investigation of the adaptively weighted cost: The adaptive weights are used in function (8). To verify (8)'s  improvement in the matching accuracy, the experiments are conducted to compare the adaptive weighted cost and the cost without weight-like Census [7] or FEP [12]. As shown in the Table 2, using the adaptively weighted cost could get better results. The results show that the relative orders with different weights related to the radii for the cost are helpful to get accurate results.

Longitudinal comparative analysis of PPEP
In this section, the effects of PPEP's every part are discussed. PPEP could be summarised into two parts: encoding pattern and encoding function. In order to analyse the improvement of every part of PPEP, Census [7] is selected as the basic algorithm for comparison. On the basis of Census, the two parts of PPEP are gradually introduced. Table 3 shows the matching results on the Middlebury 2006 data set with different configurations of illuminations and exposures when these methods The all bold values appearing in tables are used to highlighted the data with the lowest error rates.

17.5%
The all bold values appearing in tables are used to highlighted the data with the lowest error rates.
are combined with GC. As listed in Table 3, the error rates are 13.70%, 11.38% and 10.57%, respectively. The two parts of PPEP could improve the performance effectively. The encoding pattern in the proposed algorithm is the most important part. It gets 2.32% improvement while the encoding function gets 0.81% improvement. We consider that the encoding pattern is the core of PPEP, while the encoding function gets improvement on the basis of encoding pattern. To some extent, the encoding function could improve matching accuracy.

Comparisons between AD, Census, FEP and PPEP
The authors in paper [12] compared the performances of fuzzy encoding pattern (FEP) and other conventional stereo matching cost algorithms such as Census Transform (Census) [7], AD [6], ANCC [9], mutual information-based cost (MI) [10], MI and sift descriptor (MIS) [30], and support LBP (SLBP) [11]. FEP [12] had showed its improvements. Then, PPEP in our experiments will directly compare with FEP. Also, since AD and Census are classical algorithms and are widely used in many applications, PPEP is also compared with AD and Census.
(1) Middlebury data set: We evaluate above matching cost algorithms using the local and global methods on the Middlebury 2006 and Middlebury 2014 data set. Tables 4 and 5 show the evaluation results of the above algorithms using GF and GC, respectively. AD, which directly uses pixels' intensity values, is the simplest algorithm. As listed in Tables 4 and 5, AD gets worst results, especially on the Middlebury 2006 data set, which includes image pairs with various radiometric conditions. Comparing with Census using GF and GC respectively, FEP gets 1.86% and 2.27% improvements on the Middlebury 2006 data set. However, FEP performs worse than Census on the  similar changes under radiometric distortions. Then, the encoding values between adjacent pixels would have better invariance. The encoding features of PPEP would be robust relatively. The encoding pattern could be one of the reasons that PPEP gets more robust results than FEP. In addition, the matching cost computation of PPEP considers different weights of the relative orders with different radii while FEP does not consider this point. We consider that the adaptive weights are helpful to improve accuracy of matching results. By the way, Figure 5 shows the test results of the Motorcycle stereo images with different exposures and illuminations by GC. It can see that our proposed method PPEP gets better matching result (has less black areas).
(2) KITTI data set: we evaluate above matching cost algorithms on the stereo pairs on the KITTI 2015 data set. Tables 6  and 7 show the results of comparisons in non-occluded areas using GF and GC, respectively. The average error rate of the AD is still much higher than other algorithms. Obviously, AD is not suitable for complex situations. On the KITTI 2015 data set, FEP performs better than AD and Census. The error rates of FEP using GF and GC are 10.92% and 8.01%, respectively. It gets 0.72% and 0.67% improvements than Census using GF  and GC, respectively. We can also see that PPEP has the best performances of 10.43% with GF and 7.54% with GC. It gets 0.49% and 0.47% improvements than FEP using GF and GC, respectively. Overall, PPEP has the best effect on all data sets tested in the experiments.

Comparisons between PPEP and CNN-based methods
In stereo matching, deep learning techniques are widely used and get state-of-the-art results. Deep learning based methods require sufficient training set to train for the specific scenario. The neural networks could extract informative and representative features by using typical network structures. CNN-based methods could get excellent results on the testing set under the condition of having suitable and sufficient training data set in the same domain. The accurate model of MC-CNN [15] is trained by the Middlebury data set except the images with different configurations of illuminations and exposures and is named as MCMB. It is combined with GC-like PPEP and is tested by the Middlebury images with different configurations of illuminations and exposures. MCMB has the same type of training set and testing set. The total average error rate of MCMB in nonoccluded areas is 8.38%, while PPEP is 10.08% in our experiment. MCMB performs better than PPEP averagely. However, CNN-based methods rely heavily on the specific data set, they would be over-specialising in training domain and are not easy to transfer to unfamiliar scenes. When the training set and the testing set belong to different types, the performances of CNNbased methods might deteriorate heavily, whereas traditional methods do not need to train and may be suitable for a wide range situations.
In order to further explore the generalisation ability of CNN-based methods when the training set and the testing set belong to different types, the related experiments are conducted between the proposed method and the CNN-based methods. In the comparison, all of the CNN-based methods would be trained by the KITTI 2015 data set and would be tested by the Middlebury images with different configurations of illuminations and exposures. The comparison includes three CNNbased methods: MCKI [15], EDLSM [22] and CNNF [25]. The CNN-based method MCKI also uses the accurate model of MC-CNN like MCMB and the difference between them is the setting of training data set. EDLSM and CNNF are the CNNbased methods trained for feature point matching and could be used to get the matching cost like MCKI. For MCKI and CNNF, the source code and the pre-trained models on the The all bold values appearing in tables are used to highlighted the data with the lowest error rates. The all bold values appearing in tables are used to highlighted the data with the lowest error rates.
KITTI 2015 data set are used. Using Tensorflow, EDLSM is implemented based on the framework and parameters as in [22]. As in Section 3.5, the global method GC [5] is used with CNN-based methods directly. Table 8 shows the results of PPEP, MCKI, CNNF and EDLSM. The results of the CNNbased methods are worse than PPEP when the training set and the testing set belong to different types. We consider that these CNN-based methods depend on the training set severely and occur the problem of generalisation. In the cases of unsuitable and insufficient training sets, the performances of CNN-based methods might deteriorate heavily. Conversely, PPEP could be a better option in the absence of sufficient samples. PPEP could be applied to any real scenarios directly without training.

Tests of the running time
In this section, the running time of all of the methods in the comparisons are tested. As shown in Table 9, Census needed only 0.51 s to construct the disparity image space on a CPU with one core. It is the simplest method and performs faster than other methods. FEP and PPEP needed 5.42 s and 3.41 s respectively. PPEP performs faster than FEP. The reason could be that for 9 × 9 matching window case, PPEP uses 240 relative orders, but FEP uses 408 relative orders. For the CNN-based methods, MCKI, EDLSM and CNNF needed 83.54 s, 3.07 s and 3.64 s, respectively. The PPEP has running time closed to that of the CNNF and EDLSM. The MCKI has the longest running time among all the methods used in the experiments. In the Table 8, comparing with other CNN-based methods, the error rate of MCKI is closest to PPEP. However, MCKI has low running efficiency. Consider both accuracy as showed in Table 8 and the running time in this section, PPEP could be better applied to stereo matching tasks.

CONCLUSION
PPEP is proposed for stereo matching. Based on the pixel pairs, it performs robustly under radiometric distortions. Also, the new encoding function and the adaptively weighted cost are used to improve accuracy. The experimental results show that the proposed method is superior to some widely used methods in the average of bad pixel error rate. Also, PPEP performs faster than FEP in our experiments. In our future work, we will focus on the accuracy and computational complexity of the stereo matching caused by the encoding.