CFA-Based Splicing Forgery Localization Method via Statistical Analysis
Abstract
The color filter array of the camera is an effective fingerprint for digital forensics. Most previous color filter array (CFA)-based forgery localization methods perform under the assumption that the interpolation algorithm is linear. However, interpolation algorithms commonly used in digital cameras are nonlinear, and their coefficients vary with content to enhance edge information. To avoid the impact of this impractical assumption, a CFA-based forgery localization method independent of linear assumption is proposed. The probability of an interpolated pixel value falling within the range of its neighboring acquired pixel values is computed. This probability serves as a means of discerning the presence and absence of CFA artifacts, as well as distinguishing between various interpolation techniques. Subsequently, curvature is employed in the analysis to select suitable features for generating the tampering probability map. Experimental results on the Columbia and Korus datasets indicate that the proposed method outperforms the state-of-the-art methods and is also more robust to various attacks, such as noise addition, Gaussian filtering, and JPEG compression with a quality factor of 90.
1. Introduction
With the rapid development of image editing technologies, digital image manipulation has become increasingly easy to perform. Unfortunately, tampered images can introduce harmful impacts through the rapid distribution on the Internet. Consequently, image forensics aimed at forgery detection, and localization or camera identification has attracted significant attention in recent years [1]. In practical forensic applications, researchers are more interested in forgery localization, i.e., locating tampered regions, rather than other goals [2].
Most forgery localization methods can be classified into physics-based methods and statistical. The physics-based methods study physical inconsistencies of images, such as the direction of incident light [3], illumination color [4], or shading and shadows [5]. These methods analyze the overall image information with physical models. They are robust to most image postprocessing, such as resizing and recompression. Although they perform well on quite controlled scenes, they are seldomly applicable to real-world images [6].
The most successful and widespread forgery localization methods are statistical. They depend on the inherent intrinsic fingerprints left on the image during the capture process, such as noise level [7, 8], lens aberration [9], or color filter array (CFA) [10, 11]. Although these efficient methods have been widely used, their localization performance degrades significantly for images undergoing postprocessing, such as median filtering.
Fortunately, most postprocessing operations can be revealed, such as resampling [12, 13], median filtering [14, 15], and contrast enhancement [16, 17]. Moreover, the various forgery localization methods are considered as tools, and a fusion framework combining different tools can avoid their drawbacks and limitations in practical applications. Fontani et al. [18] employed Dempster–Shafer theory to define a fusion framework for image forensics, which can be easily extended incrementally with new tools. Jeong et al. [19] proposed to identify the types of image forgery using a set of mixed statistical moments. Furthermore, Cozzolino et al. [20] fused the outputs of two fine-tuned algorithms to exploit their respective strengths and weaknesses. This technique obtained the best score in phase 1 of the first Image Forensics Challenge in 2013. Benefiting from the use of statistical methods as tools in fusion framework for practical applications, the improvement of single statistical method still makes sense.
In this paper, we propose a novel CFA-based forgery localization method. Most previous CFA-based methods assume that the interpolation algorithms used in digital cameras are linear, thereby simplifying the model. However, the interpolation algorithms used are often nonlinear [21], which reduces the performance of these methods in practical applications. For the nonlinear interpolation algorithms, the coefficients may vary with different image components, but the acquired pixel domain used for interpolation can be assumed constant. The interpolation process is similar to low-pass filtering making the interpolated pixel value linearly relate to the acquired pixel values in this domain. Therefore, we calculate the probability that an interpolated pixel value is within the range of its neighboring acquired pixel values within the predicted window size, which is normalized to obtain a new feature. Finally, the expectation–maximization algorithm and curvature are employed for statistical distribution analysis to obtain the tampering probability map. This method is independent of linear assumption and insensitive to content, resulting in improved performance. The experimental results show that the proposed method outperforms the reference methods and is more robust to attacks compared to other CFA-based methods.
The main contributions of this paper can be summarized as follows: (1) A content insensitive CFA fingerprint is proposed for forgery of localization. (2) Curvature is used for automatically determining whether the statistical feature can distinguish between original and tampered regions. (3) Experiments using publicly available datasets show that the proposed method outperforms the reference methods.
This work has been organized as follows. Section 2 reviews the previous works of CFA in the image forensics task. In Section 3, we present the theory of the novel CFA-based forgery localization method. We describe the experiment evaluation in Section 4 and conclude this work in Section 5.
2. Related Works
Commercial digital cameras are equipped with a CFA in front of the image sensor to capture images with only one single color sample at each pixel location. In order to obtain a three-channel color image, an interpolation algorithm is employed to estimate the other two color samples. For the most widely used Bayer CFA, the green pixels are sampled on a quincunx lattice, the red and blue pixels are sampled on the complementary locations. This CFA has four configurations: RGGB, BGGR, GRBG, and GBRG. The top-left of the CFA image with the RGBG configuration is illustrated in Figure 1.
The specific correlations introduced by CFA interpolation can be quantified for image forensics. Popescu and Farid [10] introduced the expectation–maximization (EM) algorithm to estimate the interpolation coefficients and obtained the probability of each pixel being correlated with its adjacent pixels. The periodicity of the possibility map deriving from the interpolation artifacts presented are particularly prominently in the Fourier domain. Bammey et al. [22] found a least square optimal filter instead of the iterative EM algorithm. Furthermore, Fernández et al. [23] estimated the interpolation coefficients with the ordinary least squares algorithm and applied the discrete cosine transform on small blocks for forgery localization. The main advantage of these methods is that a wide range of modifications can be detected without previous training and knowledge. However, they rely on the estimation of interpolation coefficients, which significantly increases the computational burden.
Ferrara et al. [27] proposed a feature based on the prediction error variance to measure the absence and presence of CFA traces to obtain a fine-grained tampering possibility map that can detect small forgery. Singh et al. [28] introduced Markov random process to reduce the false detections and computational complexity on the basic study of Ferrara et al. [27]. Lu et al. [29] applied broad first search neighbors clustering algorithm to detect copied regions and duplicated regions in the copy–move images. Then they localized duplicated regions based on the prediction error. Furthermore, Chang et al. [30] detected photographic images and identified device classes based on the Fourier spectrum of the prediction error variances.
Although these methods based on prediction error have achieved good performance in various image forensics tasks, their linear interpolation assumption degrades their performance in practical applications. Most of the interpolation algorithms used in cameras are nonlinear, and their coefficients vary with the gradient to enhance edge information. As a result, these previous methods are sensitive to the content and sometimes even fail to extract CFA fingerprints effectively.
3. The Proposed Method
Similar to most previous CFA-based splicing forgery localization methods, we study the familiar Bayer CFA in the green channel. For each square of the green channel, the number of acquired and interpolated pixels is equal. These two kinds of pixels can be decomposed according to even and odd locations. However, the interpolated pixels have four locations in red and blue channels. Consequently, CFA feature extraction by applying the green channel can effectively reduce computation complexity. The proposed forgery localization framework is illustrated in Figure 2.
Let Nr be the real N used in the interpolation algorithm of the camera. For example, Nr is equal to 1 for the bilinear interpolation algorithm and Nr is equal to 2 for gradient-based interpolation algorithm [10].
The probability that G(x0, y0) satisfies Equation (4) is defined as Pint. When G(x0, y0) is the acquired pixel, Pint is denoted as PA; when G(x0, y0) is the interpolated pixel, Pint is denoted as PI. Obviously, PA < 1 and PI = 1.
Generally, since the in-camera interpolation algorithm is unknown, Nr is also unknown. Therefore, the predicted window size is used, which is named Np. PI can have various states with different relationships between Nr and Np.
As shown in Figure 3, the yellow window denotes the real window including the acquired pixels used for interpolation, namely Nr = 3. The red windows denote the predicted windows for interpolation, i.e., Np = 1 and Np = 4. Moreover, the dark green cells denote bigger coefficients, and the pale green cells denote smaller coefficients for the interpolation. For the bigger red window (Np ≥ Nr), QN(x0, y0) contains all acquired pixel values used for interpolation, and G(x0, y0) is linearly correlated to it, resulting in PI = 1. However, for the smaller red window (Np < Nr), some of the acquired pixel values used for interpolation are not within QN(x0, y0), resulting in PI < 1.
For most interpolation algorithms, the acquired pixel values closest to the interpolated pixel have higher weights. These neighboring values contribute significantly to the interpolated value. Therefore, when Np < Nr, G(x0, y0) and QN(x0, y0) are still strongly correlated and PA < PI, which can be used to distinguish between interpolated pixels and acquired pixels. Additionally, since PI is mainly affected by the difference between Nr and Np, it is constant for the same interpolation algorithm. Specifically, PI can be used to differentiate various interpolation algorithms, and it is insensitive to the content.
In the same way, we can get Lab2 from L2. Ultimately, we choose the appropriate feature as the tampering probability map through Lab1 and Lab2. When Lab1 = 1 and Lab2 = 0, L1 is used; when Lab1 = 0 and Lab2 = 1, L2 is used; when Lab1 = 1 and Lab2 = 1, both L1 and L2 can be used, and we choose to use L1 empirically.
4. Experiment Evaluation
In this section, we conduct some experiments to evaluate the performance of the proposed method. The experimental evaluation contains Columbia Uncompressed Image Splicing Detection Evaluation Dataset (Columbia dataset [33]) and Realistic Tampering Dataset (Korus dataset [34]). The Columbia dataset was acquired using four cameras (Canon G3, Nikon D70, Canon 350D Rebel XT, and Kodak DCS 330), 15% of which were taken outdoors. The captured images from two cameras were spliced to obtain 30 tampered images, for a total of six combinations to get 180 spliced tampered images. The sizes of these forgery images range from 757 × 568 to 1, 152 × 768 and the number of pixels in the tampered region is relatively large. The Korus dataset contains 220 realistic forgeries created by hand in modern photo-editing software (GIMP and Affinity Photo) and covers various challenging tampering scenarios involving both object insertion and removal. The original images were captured by four different cameras (Sony alpha57, Canon 60D, Nikon D7000, and Nikon D90) and the final forgery images are 1, 920 × 1, 080 px. Both datasets suffer a single image manipulation without any postprocessing and are saved in TIFF uncompressed format, which is beneficial to preserve the image CFA features. We only considered the reference methods that do not require training or other prior information, including CFA1 [27], CFA2, CFA3 [35], BLK [36], CAGI [37], NOI1 [38], and NOI5 [39]. For more details of the reference methods and source codes, please refer to Zampoglou et al.’s [40] study.
4.1. Performance Criteria
The MCC is robust to unbalanced classes. For some forgery images on the Korus dataset, the tampered region is much smaller than the original one, making it more appropriate to evaluate the performance of various methods with MCC.
Since the criteria used work on binary maps, and most methods only produce heatmaps with continuous values, a threshold is needed to convert these heatmaps to the corresponding binary maps. However, a single threshold algorithm will bias the detection results of different methods. Therefore, the threshold maximizing the criteria is taken. In addition, some methods just distinguish between original and tampered regions, and thus the output heatmap may have an inverted polarity with the ground truth. Consequently, we consider both the original and inverted truth ground images, leaving the best image as the result.
4.2. Parameter Discussion
The proposed method is impacted by two parameters Np and b. In this case, we assess the effect of three prediction window sizes Np = 1, 3, 5. Additionally, to assess the impact of b for the proposed method, we evaluate the performance for five block sizes: 5, 25, 45, 65, and 85. To speed up the computation, we apply the Columbia dataset, which has a lower image resolution compared to the Korus dataset, and measure the performance with MIoU scores and E.
Figure 4 represents the MIoU scores of four forgery images when the method employs different parameters. For these four detection results, the MIoU scores of the detected results become higher when the block size increases. The best results are obtained in this experiment when Np = 1 and b = 85. It is worth noting that the improvement of method performance when b = 85 over b = 65 is small. However, when b = 85, it increases the computational effort of the method, therefore we set b to 65 instead of 85 in our subsequent experiments.
To evaluate the impact of parameters in detail, we first evaluate the performance of the proposed method on the Columbia dataset when b is 65 and Np takes different values as in the previous experiment. Figure 5(a) shows the efficiency ratio E at different threshold α for the three predicted window sizes. At each α, the proposed method with the predicted window size of 1 outperforms the other two sizes. For example, at the valid threshold of 0.5, the E achieves 76.11% when the predicted window size is 1, whereas the E achieves 72.22% when the predicted window size is 3. At the valid threshold of 0.8, the E achieves 40.55% when the predicted window size is 1, whereas the E achieves 32.77% when the predicted window size is 3. The result of this experiment shows that the proposed method performs better with small predicted window size. Therefore, the Np used in the proposed method should be set to 1.
We follow the same protocol for the proposed method to assess the impact of block size b. In this case, we evaluate the performance of the proposed method on the Columbia dataset when Np is 1 and b takes different values as in the previous experiment. Figure 5(b) shows the efficiency ratio E at different valid threshold for the five block sizes. We can observe that the method performs poorly when the block size is 5 and performs particularly well when the block size is 65 and 85. In addition, when b is larger than 65, the increase of b only slightly improves the method performance. Finally, the b recommendation used in the proposed method is set to 65.
4.3. Comparative Experiments
We compare the performance of the proposed method with the reference methods with three criteria on two datasets. To evaluate the comprehensive performance of all methods, we conduct extensive experiments on one dataset by using three criteria at a time. For the proposed method, Np is set to 1 and b is set to 65. Table 1 shows the results with respect to averageMIoU, averageMPA, and averageMCC on Columbia and Korus dataset.
Dataset | Columbia | Korus | ||||
---|---|---|---|---|---|---|
Criteria | MIoU | MPA | MCC | MIoU | MPA | MCC |
Ours | 0.7659 | 0.8561 | 0.7255 | 0.6193 | 0.7814 | 0.4247 |
CFA1 | 0.6914 | 0.7806 | 0.6052 | 0.6154 | 0.7403 | 0.3957 |
CFA2 | 0.5776 | 0.7151 | 0.4554 | 0.5368 | 0.6726 | 0.2328 |
CFA3 | 0.7542 | 0.8507 | 0.7193 | 0.6155 | 0.7588 | 0.4055 |
BLK | 0.4869 | 0.6384 | 0.2957 | 0.4935 | 0.6278 | 0.1352 |
CAGI | 0.5817 | 0.7396 | 0.4706 | 0.5169 | 0.706 | 0.2229 |
NOI1 | 0.5011 | 0.6558 | 0.3263 | 0.5136 | 0.6504 | 0.194 |
NOI5 | 0.4041 | 0.6031 | 0.2119 | 0.3695 | 0.5793 | 0.0734 |
We start our evaluation with a comparison of the overall performance on the two datasets. Notably, for the proposed method, the averageMIoU score on the Columbia dataset is 22.67% better than that on the Korus dataset; the average MCC score on the Columbia dataset is 70.82% better than that on the Korus dataset. In fact, the complexity of the scenario on the Korus dataset makes the test challenging. All methods achieve much worse performance on this dataset than on the Columbia dataset. Additionally, small tampered regions on the Korus dataset affect the effectiveness of MIoU and MPA. Therefore, it is reasonable to evaluate the performance of the Korus dataset with MCC, which is robust to unbalanced classes. The Columbia dataset, with large tampered regions, can be assessed with the widely used MIoU. Regardless of the criteria, the proposed method ranks first on both datasets.
Additionally, we can readily observe that the CFA-based methods perform better than the other four methods. Experiments in Popescu and Farid’s [10] study show that the CFA-based methods perform particularly well on the Korus dataset, similar to our experimental results. In fact, experiments in Ferrara et al.’s [27] study show that the CFA-based method has low false positive rate, with a 0% false positive rate in its simulate tampering, which is an important advantage of CFA-based methods. The images of the two datasets used in our experiments are in uncompressed TIFF format, which perfectly preserves the CFA fingerprints. Therefore, the advantages of CFA-based methods are clearly exhibits, making them outperform other methods.
To visually compare the performance of the different methods, Figure 6 shows an example heat map of the localization results. Overall, CFA1, CFA3, and the proposed method outperform the other three methods in locating tampered regions. In the first and second rows, the output of CFA1 presents some false alarms that degrade the performance of the results. Although the forgery localization of CFA3 is rough, the few false alarms make its result scores higher than that of CFA1. The CFA1 method detects detailed parts of the tampered region, but there are many false alarms. The CFA3 has few false alarms, but the results are coarse, and detail is seriously lost. The proposed method detects the details of tampered regions with few false detections.
4.4. Robust Analysis
The experiments in the previous section have demonstrated the robustness of the proposed method to complex scenarios. Subsequently, we test the robustness of the CFA-based methods against various attacks. Since many postprocessing of the whole image completely destroy the CFA fingerprints, we consider only three attacks, namely noise addition, Gaussian filtering, and JPEG compression.
Compared to the RTD dataset, the image resolution in the Columbia dataset is lower. Therefore, this subsection uses the Columbia dataset for the experiments to reduce the computational cost. Three new datasets were generated by attacking the Columbia dataset. (1) We added the familiar Gaussian noise (20 dB) to images to obtain the noise addition dataset. (2) The filtering operation is similar to the interpolation process, and most of the filtering will destroy the CFA fingerprints, such as median filtering and mean filtering. The Gaussian filtering dataset is obtained by Gaussian filtering with filter size of 3 and standard deviation of 0.29. (3) Ferrara et al. [27] tested the sensitivity of their CFA-based method to JPEG compression, the performance quickly drops when the quality is less than 90. Therefore, we use the “imwrite” function in MATLAB to obtain the JPEG compressed dataset with a quality factor of 90.
Figure 7 illustrates the efficiency ratio E under various attacks. Obviously, the proposed method outperforms other CFA-based methods under noise addition and JPEG compression attacks. Figure 6 illustrates that the proposed method gives fine localization results. Therefore, the proposed method provides a high MIoU score, but it also sensitive to noise in the extracted feature. For the images after Gaussian filtering attack, the proposed method results in many high and low MIoU scores and a little of intermediate scores, i.e., high E scores when α is greater than 0.6 and low E scores when α is greater than 0.5 and less than 0.6. CFA3 gets coarse forgery localization results, thus it is less sensitive to noise in the extracted features. That is, although it gets a little of high MIoU scores, it get a lot of intermediate scores. Therefore, CFA3 has a high E score when α is greater than 0.5 and less than 0.6, but the E score decreases rapidly when α is greater than 0.6. For a more intuitive display, Table 2 shows results with respect to averageMIoU score for the Columbia dataset under various attacks. In all three new datasets, the proposed method ranks first. Moreover, it is 12.31%, 18.65%, and 24.53% better than the second-best method (CFA3), which is much larger than 1.55% on the Columbia dataset. Although the performance of the CFA-based method is significantly degraded under various attacks, the proposed method has more significant advantages over other CFA-based methods.
Noise addition | Gaussian filtering | JPEG compression | |
---|---|---|---|
Proposed | 0.4978 | 0.6081 | 0.6102 |
CFA1 | 0.3958 | 0.4319 | 0.4292 |
CFA2 | 0.404 | 0.4442 | 0.4137 |
CFA3 | 0.4432 | 0.5125 | 0.49 |
5. Conclusion
In this paper, we propose a CFA-based forgery localization method. Most previous CFA-based methods assumed the interpolation algorithm is linear, which is impractical for commercial cameras. In contrast, the proposed method is based on the fact that an interpolated pixel value falls in the range of its neighboring acquired pixel values, which is valid for both linear and nonlinear interpolation algorithms. The proposed method outperforms the reference methods and is more robust to the tested attacks.
The CFA-based forgery localization method mainly considers raw images. Although these images are rarely present in daily life, they still exist in certain fields, such as copyright protection. For raw images, the CFA-based method has a low false detection rate and outperforms most methods. Therefore, the CFA-based forgery localization methods are still useful tools in practical applications. In the future, we will try to combine the CFA-based method with various other methods to make them applicable for practical applications.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the Technical Research Program of Ministry of Public Security (2020JSYJC25) and the Open Project of Key Laboratory of Forensic Science of Ministry of Justice (KF202317).