Face morph detection for unknown morphing algorithms and image sources: a multi-scale block local binary pattern fusion approach

: The vulnerability of face recognition systems against so-called morphing attacks has been revealed in the past years. Recently, different kinds of morphing attack detection approaches have been proposed. However, the vast majority of published results has been obtained from rather constrained experimental setups. In particular, most investigations do not consider variations in morphing techniques, image sources, and image post-processing. Hence, reported performance rates can not be maintained in realistic scenarios, as the NIST FRVT MORPH performance evaluation showed. In this work, existing algorithms are benchmarked on a new, more realistic database. This database consists of two different data sets, from which morphs were created using four different morphing algorithms. In addition, the database contains four different post-processings (including print-scan transformation and JPEG2000 compression). Further, a new morphing attack detection method based on a fusion of different configurations of Multi-scale Block Local Binary Patterns (MB-LBP) on an image divided into multiple cells is presented. The proposed score-level fusion of a maximum number of 18 different configurations is shown to significantly improve the robustness of the resulting morphing attack detection scheme, yielding an average performance between 2.26% and 8.52% in terms of Detection Equal Error Rate (D-EER), depending on the applied post-processing.


Introduction
Image manipulation techniques can be applied to substantially change the appearance of face images and hence negatively affect the recognition accuracy and security of face recognition systems (FRSs). Face alteration methods include replacement or reenactment [1,2], which are frequently referred to as 'face swapping' or 'deep-fakes', retouching [3,4] as well as morphing [5,6]. Morphing techniques can be used to create artificial face images that resemble the biometric information of two (or more) subjects in the image and feature domain. Usually, the morphing process comprises the definition of corresponding landmarks, averaging, triangulation, warping, and alpha-blending [5]. Alternatively, morphs might as well be created using generative adversarial networks (GANs) [7]. An example of a morphed facial image is shown in Fig. 1b. With high probability, the morphed facial image is successfully verified against probe samples from both subjects contributing to the morph using state-of-the-art FRSs. This means that if a morphed facial image is somehow stored as a reference in the database of an FRS or in the chip of an electronic travel document, both subjects involved can successfully be verified against this manipulated reference. Morphed facial images thus pose a serious threat to FRSs as the basic principle of biometrics, i.e. the unambiguous link between biometric data and the subject, is broken.
In 2014, Ferrara et al. [8] were the first to thoroughly investigate the vulnerability of a commercial FRS against face morphing attacks. So far, a considerable amount of morphing attack detection (MAD) mechanisms has been published. For a comprehensive survey, the reader is referred to [5]. Proposed approaches can be categorised according to the MAD scenario. In the no-reference MAD scenario, the detector processes a single image, e.g. an image that is presented in a passport application procedure (this scenario is also referred to as single image MAD or forensic MAD). On the contrary, in the differential MAD scenario, a trusted live capture from an authentication attempt serves as an additional source of information for the morph detector, e.g. during authentication at an automatic border control gate (this scenario is also referred to as image pair-based MAD). Note that all information extracted by no-reference morph detectors might as well be leveraged within this scenario [9].
In this work, the focus is put on the more challenging noreference scenario. A comprehensive evaluation of two different face databases using four morphing algorithms and four postprocessing methods is conducted. It is shown that a fusion of multiple configurations of multi-scale block LBP (MB-LBP) improves the performance as well as the robustness of the MAD system. Further, the proposed fusion-based scheme that combines the complementary information extracted from various scales outperforms diverse published approaches. Moreover, as opposed to existing works, it is shown that MAD remains a challenging task in real-world scenarios where the image source and/or the algorithm used to morph face images is unknown to the detection system.
The remaining of this paper is organised as follows: In Section 2, the related work is briefly revisited. Subsequently, the used image databases are summarised in Section 3. In Section 4, the proposed system is described in detail. An in-depth evaluation is presented in Section 5. Finally, a conclusion is given in Section 6.

Related work
In general, no-reference face morphing attack detectors can be divided into three algorithm classes which utilise either (i) texture descriptors, (ii) digital forensics, or (iii) deep learning. Most  Table 1.
During the morphing process, various artefacts are created, which can be detected by analysing the texture. Due to the averaging of two images, the resulting morph is smoothed, e.g. the skin textures will loose their sharpness. Furthermore, ghost artefacts or half-shade effects occur if the two morphed images are not aligned correctly and if there are too few or incorrectly positioned landmarks. Especially in the area of the pupils and the nostrils, these artefacts occur more frequently. Other artefacts detectable by texture descriptors are distorted corners and offset image areas. In several publications, the use of common texture descriptors, e.g. local binary patterns (LBPs) [38] or binarised statistical image features (BSIFs) [39], has already been demonstrated [7,[9][10][11]21]. An extension of these algorithms to several colour channels [13,14] or higher dimensions [22] can lead to further improvements. Other texture descriptors, such as unified LBPs (ULBPs) [17,18] or weighted local magnitude patterns (WLMPs) [16] have also been tested.
The distortion and blending during the morphing process influence the high-frequency information of the image. These changes can be analysed by image forensics-based detection methods. For example, it has been shown that morphs can be detected by analysing photo response non-uniformity (PRNU) [24][25][26][27] or sensor pattern noise (SPN) [28]. Moreover, the quality of the images is reduced by editing and saving them in the morphing process. Under the assumption that the quality of morphed images is always lower than those of bona fide images, image quality can be used for morph detection. This can be done by either analysing intentional degradation of the image in question [23] or by using several quality features in combination with a classifier [31]. Under the assumption that the images are stored in a lossy compression format before and after morphing, it is possible to detect morphs by analysing double compression artefacts [19,29]. Furthermore, the images can be examined for inconsistencies, e.g. for non-natural lighting conditions or colour values [30].
The third class of no-reference algorithms are based on deep learning. Deep learning-based feature extractors offer the advantage that they can theoretically learn to detect any artefact present in the training set. This, however, carries the risk of overfitting to artefacts, which only occur with the morph algorithms used for training and are therefore not generally valid. One possibility is the training or transfer learning of a network for the detection of morphs [33,34]. Another possibility is the use of pretrained neural networks for feature extraction in combination with a classifier (e.g. support vector machine -SVM) [32,37]. Deep features can also be combined with other features (e.g. LBP) [35].
While the majority of morph detection approaches report practical detection error rates, these are commonly evaluated on a dataset of bona fide and morphed face images which are extracted from a single (in-house) face database and created by a single morphing algorithm. It was shown that variations in the dataset [20], morphing process, and post-processing (e.g. print and scan [11]) might negatively influence the performance of the morph detection algorithms. This has also been confirmed in the face recognition vendor test (FRVT) MORPH conducted by the National Institute of Technology (NIST) [40]. In [12], a fusion of multiple algorithms was proposed, as it might improve the detection performance of no-reference algorithms. Even the fusion of different configurations of the same algorithm was found to be beneficial.

Database
The results of this work were obtained based on subsets of the FRGCv2 [41] and FERET [42] face image databases. From these databases, potential reference images meeting the International Civil Aviation Organization (ICAO) passport photo quality standards [43] are selected. From the pre-selected images, image pairs are created for the morphing process, where possible, different images are used for morphing and as bona fide samples. However, for some subjects, there are not enough samples, so the same image is used in both subsets (morphed and bona fide). The number of used subjects, bona fide images as well as the number of created morphs are given in Table 2.

Morphing
Different morphing algorithms produce morphs with different artefacts. For a comprehensive evaluation, a database with different morphing algorithms is, therefore, necessary to ensure that the morph attack detection algorithms have not stiffened to algorithmspecific artefacts. For this purpose, four morphing algorithms were used to ensure a large variation of morphs, examples are shown in Fig. 2 with equal contribution of both subjects: wearemoment.com/FaceFusion/], a proprietary morphing algorithm. Due to the inaccessible source code, it is not possible to determine in which way the morphs are generated. It can be seen, that after the morphing process, parts of the first subject are blended over the morph to hide artefacts (eyes, nostrils, outer facial region). The created morphs have a high quality and low to no visible artefacts.
(ii) FaceMorpher [github.com/alyssaq/face_morpher], an opensource implementation using Python. In the used version, the algorithm applies STASM for landmark detection. Delaunay triangles are formed from the landmarks, which are distorted and blended. The area outside the landmarks is averaged. The generated morphs show strong artefacts in particular in the area of neck and hair. (iii) OpenCV, a self-made morphing algorithm based on the tutorial 'Face Morph Using OpenCV' [www.learnopencv.com/face-morphusing-opencv-cpp-python/]. This algorithm works similar to FaceMorpher. Important differences between the algorithms are that for landmark recognition Dlib is used instead of STASM and that for this algorithm landmarks are positioned at the edge of the image, which is also used to create the morphs. Thus, in contrast to FaceMorpher, the edge does not consist of an averaged image, but like the rest of the image, of morphed triangles. However, also in this version, strong artefacts outside the facial area can be observed, which is mainly due to missing landmarks. (iv) UBO-Morpher, the morphing tool of the University of Bologna, as used e.g. in [44]. This algorithm receives two input images as well as the corresponding landmarks. Dlib landmarks were used for this morphing tool. The morphs are generated by triangulation, averaging and blending. To avoid the artefacts in the area outside the face, the morphed face is copied to the background of one of the original images. Even if the colours are adjusted, visible edges may appear at the transitions.
In order to be able to conduct a fair benchmark in our experiments, the same combination of morphed face images was created for each of the listed algorithms.

Post-processing
In addition to the considered ICAO compliance, various postprocessings of the images must also be taken into account, since images of the database aim at imitating real-world scenario of the application process of an electronic travel document. In many countries, the images are down-scaled, e.g. to 360 × 480 pixels, and compressed, e.g. to 15 kB using JPEG2000, prior to storing them on the chip of an electronic travel document, e.g. an ePassport. In addition, the images can be handed over in printed form by the applicant. It can be assumed that morphs are more easier to be recognised in unprocessed images and that each post-processing step increases the difficulty of reliable detection. In order to cover the realistic scenarios, the following post-processings have been applied: The resolution of the images is reduced to the minimum inter-eye distance (90px) required by the ICAO guidelines for electronic travel documents [43]. This postprocessing corresponds to the scenario that an image is submitted digitally by the applicant. An example is shown in Fig. 3a. This post-processing is applied in advance to all subsequent postprocessings described below.
(ii) JPEG2000 Compression (J2): A wavelet-based image compression method that is recommended for electronic travel documents [45]. The setting is selected in a way that a target file size of 15 KB is achieved. This post-processing corresponds to the scenario that a digitally submitted image is stored in the chip of the electronic travel document. An example is shown in Fig. 3b.

Validation of attack potential
To assure the significance of the following experiments, the attack potential of the created databases is evaluated in a first step. For this purpose, comparison scores for genuine and impostor comparisons, as well as for morphing attacks are determined and the mated morph presentation match rate (MMPMR) and the relative morph match rate (RMMR) defined in [46] is estimated. The FRGCv2 provides probe images showing a significantly higher variance (and therefore higher realism) compared to the probe images contained in the FERET database, thus the validation of the attack potential is limited to the FRGCv2 database. Due to the lower variance of the sample images, the comparisons of the FERET database results in higher comparison scores for genuine and morph attack comparisons, thus the results obtained on FRGCv2 can be considered as a lower limit for the attack potential. The comparison scores were generated using a commercial-of-theshelf (COTS) FRS. The resulting probability density functions (PDFs) are depicted in Fig. 4. In most publications, databases with symmetric morphs are used. This means that both subjects are equally contributing to the creation of the morph. However, it is also suggested, e.g. in [44], to assign a lower weight to one subject, in order to increase the chances in the case of a manual control with this subject. For this reason, in addition to the PDFs of symmetrical morphs in Fig. 4b, the distributions of asymmetrical morphs with a weighting of 25 and 75% (α = 0.25) are shown in Fig. 4a, the corresponding MMPMR and RMMR are listed in Table 3. Since the FRS maintains a zero FNMR at the considered FMR of 0.1% the MMPMR is equal to the RMMR. However, it is evident that the asymmetric morphs, regardless of the applied morphing algorithm, have no attack potential for the used FRS. This behaviour is reinforced by the realistic variance of the probe images used. As a consequence, only symmetrical morphs are considered in this paper.

Proposed system
The proposed system, which is depicted in Fig. 5, comprises three key modules, (i) MB-LBP extraction, (ii) cell division and (iii) training and score-level fusion; in the following subsections, all modules are described in detail. To avoid algorithm overfitting on avoidable artefacts, e.g. ghost-artefacts in hair regions, the image is cropped to a size of 320 × 320 pixels using predefined offsets,  whereby the image area showing the face is cut out. Finally, the cropped face part is converted to a greyscale image.

Multi-scale block LBP
LBP is a powerful feature for texture classification. Specifically, LBP is suitable for detecting morphed face images in no-reference scenarios [12]. The distortions of the images introduced by the morphing process are changing the texture of the images in a way that can be detected in an LBP-histogram. Further, the images are averaged during the blending, which smooths the resulting morph, leading to less sharp edges, which are reflected in an LBPhistogram, too. In addition, the morphing process might introduce minor artefacts to the image [46]. As LBP is designed for the representation of surface properties, these artefacts can be represented in the LBP-histogram as well and can be utilised to detect morphed face images. The original LBP operator labels the pixels of an image by thresholding the 3 × 3-neighbourhood of each pixel with the centre value and considering the result as a binary string or a decimal number. Then the histogram of extracted LBP values can be used as a texture descriptor. MB-LBP [47] is an extension to the basic LBP, with respect to neighbourhoods of different sizes. In MB-LBP, the comparison operator between single pixels in LBP is replaced with the comparison between average pixel intensities of sub-regions. Each sub-region is a square block containing neighbouring pixels. In each sub-region, the average sum of pixel intensities is computed. These average sums are then thresholded by that of the centre block. The whole filter is composed of nine blocks (centre block and eight neighbouring blocks) of the size (2k + 1) × (2k + 1) pixels. If a higher value for k is selected, details are lost while robustness increases [47]. An example of the basic LBP and the MB-LBP operator is shown in Fig. 6. In order to be able to compute the LBP blocks in the peripheral regions, padding border lines and columns are added to the image in advance, which replicates the outer pixel values.

MB-LBP feature extraction over multiple cells
Even if the performance of LBP in constrained scenarios is promising, the detection performance of LBP highly degrades when the face images are post-processed, e.g. by printing and scanning. Further, it was observed, that smaller blocks show a higher performance on single databases, but larger blocks are more robust in a cross-database analysis [20]. Scherhag et al. [12] have shown that a fusion of two LBP configurations might lead to increased performance and robustness of the algorithm.
After the computation of the MB-LBP values, the resulting image is divided into c × c cells. For each cell a histogram is calculated, the individual histograms are concatenated to a longer MB-LBP feature vector. As c increases, so does the number of concatenated histograms and thus the size of the feature vector. With that comes the benefit of retaining more local information.
Thus, at feature-extraction, the MB-LBP feature extraction is applied to the post-processed image in different configurations. The configurations consist of the possible combinations resulting from the values for k and c. Values from 0 to 5 are selected for k, since too much information is lost with even larger values. The picture is divided into a maximum of 3 × 3 cells (c = {1, 2, 3}), otherwise, the ratio between the patch size and the cell size is disproportionate.

Experiments
In the following section, the experimental setup, as well as the evaluation of the experiments, are described, including a discussion of the observed results. The performance evaluations are conducted based on the database described in Section 3.

Morph detection performance evaluation
For the performance evaluation of the described algorithm, the SVM classifiers are each trained on one post-processing and one morphing algorithm at a time using the FERET database. The evaluation is performed on FRGCv2 database and all other morphing algorithms, resulting in 12 combinations per postprocessing and 48 combinations in total. The performance of the detection algorithms is reported using the detection equal error rate (D-EER), i.e. the operating point where the proportion of attack presentations incorrectly classified  In a preliminary analysis, all possible MB-LBP configurations with different cell divisions described in Section 4.2 were trained and tested on images that were not post-processed. The best configuration and the corresponding error rates are listed in Table 4. It is apparent that a subdivision into more cells (c = 3) is preferred. However, no single configuration is equally suitable for all morphing algorithms and databases. For example, MB-LBP with k = 0 and c = 3 cells can reach an D-EER as low as 1.9% detecting FRGCv2 FaceMorpher morphs if trained on FERET database and images created by the UBO-Morpher algorithm, but overall this configuration yields an D-EER of only 28.0%. In order to obtain a more robust MAD algorithm, multiple MB-LBP configurations can be fused as described in Section 4.3. In Fig. 7, a box plot over the distribution of D-EERs of all possible fusion combinations of a fixed number of algorithms is depicted, the corresponding average D-EER per number of fused combinations is listed in Table 5. The maximum number of algorithms to fuse is limited, since the algorithm described in Section 4.2 allows for 18 different MB-LBP configurations. With an increasing number of fused algorithms, the median of the D-EERs is lowered, as well as the upper quartile and whisker. Thus, in the remainder of this manuscript, only the fusion of all 18 configurations is considered.
In Table 4, in the two rightmost columns, the performance of the fused algorithm is shown. For database and morph algorithms that are easily detectable by single algorithms, the fused algorithm performs good as well. For subsets that are harder to detect, the performance of the fused algorithm drops, but in general, they are more robust than single algorithms.

Comparison to state-of-the-art algorithms:
The morph attack detection performance of common state-of-the-art morph detection algorithms is listed in Table 6. Additionally, the corresponding detection error trade-off (DET) curves are depicted in Fig. 8. The algorithms used for comparison comprise of opensource facial recognition frameworks based on a ResNet deep neural network (ArcFace [49]) and two texture descriptors, namely LBP [38] and BSIF [39] with patches of size 3 × 3 and an optional division into 4 × 4 cells.
As can be seen, the proposed MB-LBP fusion approach clearly outperforms the algorithms used for comparison. Especially the detection of PS processed morph images performs far better with an average D-EER of 2.26%, whereas the best of the other algorithms yields an average D-EER of 12.95%. The same applies to resized morphs which are significantly better detected by the proposed algorithm (5.80%) than by the other algorithms (14.78%). Also, the challenging JPEG2000 and Print/Scan -JPEG2000 processed morphed images are better detected by the proposed algorithm with an average D-EER of 8.32 and 8.52%, respectively, while the best of the other algorithms yield an average D-EER of 19.77 and 22.77%, respectively. In all cases, it is the BSIF algorithm in one of the two configurations that come closest to the performance of MB-LBP. The superiority of texture algorithms over deep learning algorithms in no-reference scenarios can also be observed in Fig. 8 in the DET plots. In all four plots, it can be clearly seen that ArcFace ranks last and delivers performance that is largely unsuitable for practical use in a no-reference scenario. Although the performance of ArcFace is poor, it should be noted that the deep learning algorithm responds much less sensitively to image post-processing and are therefore can be considered to be more robust overall.
In the following, it will be analysed how large the influence on the performance of the fused algorithms is when training and testing are done on differently post-processed images. In addition, it will be shown in which way the performance depends on the choice of the morphing algorithm for training. Therefore, in Figs. 9-12, kernel density estimation (KDE) plots are given, showing the distribution of attack and bona fide presentations over a range from 0 to 1, with 0 meaning bona fide and 1 meaning attack. Each pair of thin green and red curves indicates the performance of a morphing algorithm that has been used for training. The thick red and green curves depict the mean performance across all training algorithms, with the D-EER line (dashed vertical line) indicating the threshold that separates the attack and bona fide presentations on average. It should be noted that the EER line differs for the individual post-processings, each representing different application scenarios. It appears, however, that J2 compression seems to be dominating so that for J2 and PS-J2 an equal error is achieved at the same threshold (0.4).

Resized:
The detection performance rates are shown in Table 7. If the resolution of the images is reduced by half, the average D-EER improves by almost 50% compared to the results of the preliminary examination to 5.80%. This can be explained by the fact that the reduction of the resolution and thus the deletion of high-frequency information results in an alignment of the two databases regarding their image structure, such as image noise. In particular, for a smaller value of k it is more likely, that irrelevant information owed to the image acquisition format is taken into account during training. It can, therefore, be assumed that training on resized images is more likely to consider information that is actually caused by the morphing process.
As shown in the KDE plot in Fig. 9, the dashed red curve denoting FaceMorpher is far to the left of the EER line, indicating that probably many of the attack images are misclassified as bona fide. In this case, training on FaceMorpher could therefore be detrimental to morph attack detection performance.
The performance deteriorates significantly when training is done with all morphing algorithms, but at the same time, it becomes more robust.

Print/Scan
: The detection performance rates are shown in Table 8. Similar to the resized scenario, the printing and subsequent scanning of the morphed images seems to result in an extensive elimination of image capture format-dependent information so that morphing-specific information is again more likely to be considered during training. This explains the good average performance of 2.26%.
Again, the KDE plot in Fig. 10 clearly shows that training on Facemorpher shows the least competitive results. The corresponding red curve lies to a large extent to the left of the EER line. As also the corresponding bona fide curve is shifted to the left, the D-EER of 7.60% is still okay when training is done on FaceMorpher.
However, the step-like appearance of the MB-LBP plot is shown in Fig. 8b and the straight sections on both sides of the curve indicate that the statistical significance, in this case, is limited, which is due to the size of the database used for testing in connection with the very good morph attack detection performance. Since the selection used for testing from the FRGCv2 database contains only 2405 images, the number of incorrectly classified images is very low overall due to the very good morph attack detection performance, meaning that even a few misclassifications can have a large impact on the resulting D-EER. In future work, it could therefore be investigated whether the achieved performance determined here can also be verified when testing is done with significantly larger databases.

JPEG2000:
The detection performance rates are shown in Table 9. When compressing the images using the JPEG2000 method, so much information is lost that the effect from the two previous scenarios does not occur. With an average of 8.32%, the  D-EER is almost 70% higher compared to the resized scenario. Also, the performance is not as robust as in the other scenarios with values ranging between 7.34% (FaceFusion) and 21.87% (FaceMorpher). Fig. 11 shows that, as in the previous scenarios, the red curve indicating FaceMorpher lies far to the left of the EER line. In this case, it lies even further to the left than the thick green curve, which represents the average performance, explaining the high D-EER of 21.87% when training is done on FaceMorpher. However, as Fig. 8c clearly points out; the MB-LBP algorithm still performs significantly better than all state-of-the-art algorithms compared.

Print/Scan -JPEG2000
: The detection performance rates are shown in Table 10. Compressing the printed and scanned images using the JPEG2000 algorithm dramatically deteriorates morph attack detection performance up to 8.52%. However, it can also be seen that the performance is not reduced compared to the JPEG2000 scenario, as the D-EER increases by only 0.2% percentage points. Also, the DET plot (Fig. 8d) and the KDE plot (Fig. 12) of the two scenarios look very similar. This indicates that it is in fact, only the JPEG2000 compression that affects performance and printing and subsequent scanning does not further degrade the morph attack performance.

Generalisation:
If looking at how the detection performance changes when different post-processings are used for training and evaluation, the trend already observed becomes apparent again. As one can see in Table 11 PS images are always detected the best. When evaluating J2 images, the performance deteriorates significantly. While J2 images are only poorly detected, training on them provides relatively robust results. It can be assumed that depending on the application scenario, training with different post-processings is preferable. Future research might consider the potential effect on the robustness of the attack   detection performance when a fusion of multiple post-processing is performed.

Morph attack detection combined with the FRS
It is worth investigating to what extent the detection performance of the proposed system is influenced by the use of an FRS. Using a COTS FRS, the threshold, which decides whether an image is accepted or rejected, is selected in such a way that an FMR of 0.1% is achieved. The proposed morphing detection system is applied for each reference face image which has been part of a biometric match of the FRS. If the FRS rejects many of the supposedly easier to detect morphs, these attacks fall out of the set of attacks that the MAD algorithm has to detect, which could lead to a relative deterioration of the detection performance. On the other hand, the bona fide images, which are incorrectly rejected by the FRS, also fall out of the set of images that the MAD algorithm has to examine. For this experiment, the MAD system operates at the threshold of the D-EER point. As can be seen in Fig. 13, the FRS incorrectly accepts over 90% of the attacks of which 77% are rejected by the MAD, resulting in an APCER of 23%. It might also be of interest to note that nearly 20% of the morphs rejected by the FRS would have been accepted by the MAD, indicating that some of the morphs which are more easily detected by the FRS are potentially somewhat more difficult for the proposed system to detect. This shows the potential that lies in a combination of the two systems. However, 100% of the bona fide images are recognised as such by the FRS, while the BPCER of the MAD system lies at 23%. Therefore, APCER and BPCER are still identical, and the average detection performance of the proposed system did not deteriorate despite the preceding use of the FRS.

Conclusion
In this paper, the performance of MB-LBP on morphs created by four different morphing algorithms is evaluated. In addition, the images were post-processed in various ways. The performance of single algorithms highly depends on the morphs used for training and testing. The robustness of the algorithms over different morphing algorithms and databases can be increased by the fusion of multiple MB-LBP configurations and different cell divisions. Further, it is demonstrated that the robustness increases with the number of fused algorithms. Training on multiple attack types leads to a more robust morph detection performance and, therefore, lower error rates in some cases. The proposed MB-LBP fusion approach outperforms most of the state-of-the-art no-reference morph detection algorithms. Finally, this paper emphasises the need for robust morphing detection algorithms and diverse databases comprised of different image sources and morphing algorithms in order to reliably train and evaluate face MAD algorithms.