ActiveGuard: An active intellectual property protection technique for deep neural networks by leveraging adversarial examples as users' fingerprints
Abstract
The intellectual properties (IP) protection of deep neural networks (DNN) models has raised many concerns in recent years. To date, most of the existing works use DNN watermarking to protect the IP of DNN models. However, the DNN watermarking methods can only passively verify the copyright of the model after the DNN model has been pirated, which cannot prevent piracy in the first place. In this paper, an active DNN IP protection technique against DNN piracy, called ActiveGuard, is proposed. ActiveGuard can provide active authorisation control, users' identities management, and ownership verification for DNN models. Specifically, for the first time, ActiveGuard exploits well-crafted rare and specific adversarial examples with specific classes and confidences as users' fingerprints to distinguish authorised users from unauthorised ones. Authorised users can input their fingerprints to the DNN model for identity authentication and then obtain normal usage, while unauthorised users will obtain a very poor model performance. In addition, ActiveGuard enables the model owner to embed a watermark into the weights of the DNN model for ownership verification. Compared to the few existing active DNN IP protection works, ActiveGuard can support both users' identities identification and active authorisation control. Besides, ActiveGuard introduces lower overhead than these existing active protection works. Experimental results show that, for authorised users, the test accuracy of LeNet-5 and Wide Residual Network (WRN) models are 99.15% and 91.46%, respectively, while for unauthorised users, the test accuracy of LeNet-5 and WRN models are only 8.92% and 10%, respectively. Besides, each authorised user can pass the fingerprint authentication with a high success rate (up to 100%). For ownership verification, the embedded watermark can be successfully extracted, while the normal performance of DNN models will not be affected. Furthermore, it is demonstrated that ActiveGuard is robust against model fine-tuning attack, pruning attack, and three types of fingerprint forgery attacks.
1 INTRODUCTION
Deep neural networks (DNN) have been widely used in various fields, such as speech recognition, biometric authentication, and autonomous driving etc. Training high-performance DNN models is costly and time-consuming which requires expensive hardware resources, a large amount of training data, and expert knowledge [1-4]. Therefore, the trained DNN model can be regarded as an intellectual property (IP) of the model owner. For example, machine learning as a service [5] is a very popular business paradigm now. However, malicious users/pirates may illegally copy, redistribute, or abuse the DNN models without permission [3, 4, 6, 7]. IP protection for DNN models is an emerging problem, which has attracted more and more serious concerns recently.
However, protecting the IP of DNN models is a completely new problem, and the existing copyright protection techniques in multimedia area cannot be applied to DNN directly [4, 8]. Since DNN models have complex structures and a large number of parameters, and are often deployed in black-box scenarios, it is considerably challenging to protect the IP of DNN models.
In recent years, many DNN watermarking works have been proposed to protect the IP of DNN models. However, the DNN watermarking methods can only passively verify the copyright of the DNN model after the model is pirated, which cannot prevent piracy in advance. Besides, the DNN watermarking methods cannot identify and manage different users' identities, and thus cannot meet the requirements of practical commercial applications.
To date, very few active authorisation control works [3, 9, 10] have been proposed to protect the IP of DNN, in which authorised users can obtain a high model performance, while illegal users will obtain a poor model performance. However, the work [3] requires an extra anti-piracy transformation module to verify whether a user is legal or not. To use the DNN model normally, authorised users require to pre-process each input data using the transformation module, which introduce high computational overhead. The work [9] embeds multiple passport layers into the DNN model, which introduces a high overhead. Besides, the passport-based method is vulnerable to reverse-engineering attack and tampering attack. The hardware-assisted method [10] relies on the trusted hardware devices as root-of-trust to store the key for each user, which is costly. In addition, these existing active authorisation control works [3, 9, 10] do not support users' identities management, and thus cannot meet the requirements of practical commercial applications.
In this paper, we aim at actively protecting the IP of DNN models which can prevent the occurrence of DNN piracy in the first place, and providing users' identities management. For the first time, we propose an active IP protection technique for DNN models via adversarial examples based user fingerprinting, named ActiveGuard. The proposed method can provide active authorisation control, users' fingerprints management, and ownership verification. To provide users' fingerprints management, specific adversarial examples are generated as users' fingerprints where each authorised user is assigned with an adversarial example as his fingerprint/identity. Then, the authorised user can input his unique adversarial example into the DNN model for identity verification. To provide authorisation control, a control layer is added to the DNN model. The control layer can restrict the usage of unauthorised users on the protected DNN model, that is, the DNN model is dysfunctional to the unauthorised users. To provide ownership verification, a numerical watermark is embedded into the weights of the DNN with a parameter regulariser. When the DNN model is suspected of being pirated, the model owner can extract the embedded watermark to verify ownership.
We have published a previous conference paper [11] in the AAAI workshop on Practical Deep Learning in the Wild, and this paper is the extended version of the conference paper [11]. Compared with our previous conference version [11], the new materials and contributions in this extended version are as follows: (i) Comparison with existing DNN copyright protection methods are presented, including discussions in Section 2 and experimental comparisons in Section 4.6; (ii) The robustness of the proposed ActiveGuard method against three fingerprint forgery attacks are evaluated, including using clean images as fake fingerprints (FPclean), using fake fingerprints generated by using the Fast Gradient Sign Method (FGSM) [12] method (FPFGSM), and using fake fingerprints generated by using the C&W [13] method (FPCW); (iii) More experimental results, and more parameter discussions are presented in Sections 4.2-4.4; (iv) The related works about DNN copyright protection are reviewed in Section 2; (v) The active authorisation control process is presented in Figure 2; (vi) Some examples of users' fingerprints that generated from MNIST [14] and CIFAR-10 [15] datasets are presented in Figure 3; (vii) The process of users' fingerprints allocation is illustrated in Figure 4; (viii) Detailed mathematical description about users' fingerprints authentication is presented in Section 3.3; (ix) The steps of how to design the mapping function map(⋅) is presented in Section 3.5.1; (x) The proposed watermark extraction and verification algorithm is presented in Algorithm 1; (xi) The detailed introduction about the experimental setup, including the datasets, the DNN models, and the parameter settings, are presented in Section 4.1; (xii) The structure and the maximum length of the embedded watermark at each convolutional layer of the two DNNs are presented in Section 4.1.4.
Model | Convolutional layer | Structure of weight matrix D | Maximum length of an embedded watermark |
---|---|---|---|
LeNet-5 | conv 1 | (5, 5, 1, 6) | 25 |
conv 2 | (5, 5, 6, 16) | 150 | |
WRN | conv 1 | (3, 3, 16, 32) | 144 |
conv 2 | (3, 3, 64, 64) | 576 | |
conv 3 | (3, 3, 128, 128) | 1152 | |
conv 4 | (3, 3, 256, 256) | 2304 |
- Abbreviation: WRN, wide residual network.
-
Active IP protection for DNN. An active IP protection technique for DNN via adversarial examples based user fingerprinting is proposed. The key idea is that, we regard adversarial examples with specific classes and confidences as users' fingerprints/identities, and achieve authorisation control based on the unique fingerprint of each authorised user. Experimental results on MNIST [14] and CIFAR-10 [15] datasets show that, for authorised users, the test accuracy of LeNet-5 [16] and wide residual network (WRN) [17] models are 99.15% and 91.46%, respectively. As a comparison, for unauthorised users, the test accuracy of the LeNet-5 and WRN models are only 8.92% and 10%, respectively.
-
Advantages over existing works. Most existing DNN IP protection works are passive verification methods after the piracy occurs, while this work can provide active copyright protection and copyright management for DNN models. Compared with the few existing active authorisation control works [3, 9, 10], ActiveGuard is the first work to achieve both users' identities management and active authorisation control. Besides, ActiveGuard introduces lower overhead than these existing active authorisation control works. The comparison with related works is discussed in Section 2 and experimentally evaluated in Section 4.6.
-
Users' identities management. We provide a novel users' identities management scheme based on adversarial examples, including users' fingerprints generation, users' fingerprints allocation and users' fingerprints authentication.
-
Enabling ownership verification. To provide ownership verification, we design an effective DNN watermarking scheme, where a numerical watermark is embedded in the weights of DNN discretely by leveraging a regulariser. This watermarking scheme can successfully embed a large-capacity watermark in the weights of the DNN model without affecting the normal usage of the DNN model. In the experiments, the accuracy change of two watermarked DNN models is +0.03% (LeNet-5 [16]) and +0.08% (WRN [17]), respectively, which indicates that the proposed ActiveGuard method will not degrade the performance of DNN models. Compared with the existing watermarking method [1], the proposed watermark embedding method can provide a larger capacity (0–9 each bit, rather than 0/1 each bit) and is stealthier (because it is embedded discretely).
-
Anti-attack capability. The proposed ActiveGuard has a strong anti-attack capability. ActiveGuard is not only robust to fine-tuning attack and pruning attack, but also robust against three kinds of fingerprint forgery attacks (i.e. the FPclean attack, the FPFGSM attack, and the FPCW attack). In our experiments, first, when the LeNet-5 [16] and WRN [17] models are fine-tuned for 50 epochs, ActiveGuard can still correctly extract the embedded watermark. Second, even if 90% parameters of DNNs are pruned, ActiveGuard can still extract the embedded watermark successfully. Finally, ActiveGuard can resist three different fingerprint forgery attacks, and the attack success rate of 10,000 forged fingerprints only ranges from 0.01% to 0.1%.
The rest of this paper is organised as follows. Related work is reviewed in Section 2. The proposed method, including authorisation control, users' identities management and copyright verification, is elaborated in Section 3. The effectiveness and robustness of the proposed method are evaluated in Section 4. This paper is concluded in Section 5.
2 RELATED WORK
We briefly review the related work about DNN copyright protection, including the white-box watermarking methods, black-box watermarking methods, and few authorisation control methods.
2.1 White-box watermarking methods
Uchida et al. [1] first propose to protect the IP of DNN by embedding a watermark. They utilise a parameter regulariser to embed N-bit strings into the weights of a DNN model. Rouhani et al. [2] embed watermark into the probability density distribution of activation maps. The embedded watermark can be triggered by submitting a specific input data, so as to verify the ownership of the model remotely. Wang et al. [18] propose to watermark the host DNN through an independent neural network and error backpropagation. Specifically, in the watermark embedding phase, the watermark is embedded in the host DNN via error backpropagation. In the watermark verification phase, the independent neural network takes the specific weights of the DNN as input and outputs the extracted watermark [18]. Abuadbba et al. [19] propose an invisible fragile watermarking method for convolutional neural network (CNN), named DeepiSign. DeepiSign embeds a secret key and its hash value into the weights of the CNN. To verify the integrity of the CNN model, the secret key is extracted and its hash value is compared with the embedded hash value [19]. Our work is essentially different from the CNN signature method [19]. The CNN signature method [19] aims to verify the integrity of the models, while this paper aims to perform active authorisation control and ownership verification to resist DNN piracy.
2.2 Black-box watermarking methods
Merrer et al. [20] embed a watermark into the DNN through adversarial examples. They slightly alter the decision boundary of the DNN, and the specific adversarial examples nearby the decision boundary are used as the watermark key set to remotely verify the ownership of the DNN model. Adi et al. [21] leverage the overparameterisation of DNN models and develop a backdoor-based watermark embedding method. Zhang et al. [7] develop three backdoor-based watermark embedding approaches: embedding meaningful content, embedding noise, and embedding irrelevant data, as the watermark, respectively. Zhang et al. [22] propose a watermarking approach to protect the image processing model from being pirated. They embed an invisible watermark into output images of the model. If an attacker uses these output images to train a surrogate model, the surrogate model will be also embedded with the same watermark. Jia et al. [23] leverage a special loss function to train the target DNN model so as to embed an entangled watermark. In this way, the features of the watermark are similar to those of the normal training data. Removing the watermark from the target model will significantly degrade the model performance on normal data [23]. Maung et al. [24] develop a DNN watermarking method based on image encryption, where transformed images and original images are used to train the DNN model to embed the watermark. If an attacker wants to add his own watermark into the DNN model, the model performance will drastically degrade.
2.3 Few authorisation control works
To date, there are few authorisation control works [3, 9, 10]. Compared with the DNN watermarking methods, the authorisation control methods aim to provide high performance for authorised users, while unauthorised users cannot use the model or will obtain a poor performance. Chen and Wu [3] train an anti-piracy DNN by altering the loss function of the DNN. The anti-piracy DNN shows low accuracy for unauthorised users' inputs (raw input) and retains high accuracy for authorised users' preprocessed inputs (adversarial inputs). Fan et al. [9] protect the copyright of DNN models by using digital passports. Some carefully designed passport layers are hidden inside the DNN model, such that the model only retains high performance for legitimate users. Chakraborty et al. [10] develop a hardware protected Neural Network framework, which exploits the trusted hardware devices as the root-of-trust for authentication to protect the copyright of DNN. They leverage a key-dependent backpropagation algorithm to train an obfuscated deep learning model. As a result, only the authorised users with the trusted hardware devices (storing the key) can run the deep learning model normally [10]. The drawbacks of these authorisation control works [3, 9, 10] have been discussed in Section 1.
- (i)
Active authorisation control and users' identities management. Most of the existing DNN IP protection methods are passive verification methods, that is, the IP of the model is passively verified after the infringement occurs, while this work can provide active copyright protection and copyright management for DNN. The few existing active DNN copyright protection works [3, 9, 10] only realise active authorisation control. As a comparison, the proposed method can achieve both active authorisation control and users' identities identification. These are the functions required for commercial applications. Besides, ActiveGuard induces lower overhead than these existing active authorisation control works.
- (ii)
Generate unique fingerprint for each authorised user. For the authorisation control method in the work [3], each authorised user requires to preprocess each input by a conversion module to use the target DNN, which introduces a high overhead. However, our proposed method only requires to generate one adversarial example for each user, and allocates it to each authorised user as his/her unique fingerprint. Specifically, for the first time, the proposed ActiveGuard method utilises the low confidence interval and generates a unique adversarial example based on a specific class t and a fixed confidence c as a user's fingerprint.
- (iii)
Low overhead. The proposed method designs a control layer and adds it to the end of the DNN, so as to restrict the usage of unauthorised users. Compared to existing active authorisation control methods [3, 9, 10] which require an extra transformation module to pre-process the input data [3], or require to append passport layers after each convolutional layer [9], or require the support of trusted hardware devices which are costly [10], the proposed ActiveGuard method is low-overhead and low-cost, which makes the proposed method more practical for commercial applications.
- (iv)
Larger capacity and a stealthier watermark embedding method. To achieve copyright verification, the proposed ActiveGuard embeds a numerical watermark into a convolutional layer. We design a watermark mapping function to map the weights of a convolutional layer to the numerical digits 0–9. In this way, compared with the watermarking method [1], the proposed watermark embedding method can provide a larger capacity (0–9 each bit, rather than 0–1 each bit). Besides, the watermark in the work [1] is embedded in continuous weights of DNN. However, the proposed watermark can be embedded discretely among the weights of a convolutional layer, which makes the embedded watermark stealthier and more flexible.
3 THE PROPOSED METHOD
3.1 Overall flow
In this section, we elaborate the proposed active IP protection method (ActiveGuard) for DNN, as shown in Figure 1. The overall flow of ActiveGuard includes four steps: (i) The model owner embeds a specific numerical watermark into the DNN model. The DNN embedded with watermark is referred to as the watermarked DNN. (ii) The model owner deploys the watermarked DNN as an online service and generates the licences (i.e. specific adversarial examples) for authorised users to achieve active authorisation control. (iii) Authorised users submit their fingerprints (i.e. adversarial examples) to the model to verify their identities, and then use the DNN model normally. On the contrary, unauthorised users will obtain low model performance due to the added control layer. (iv) When the model owner suspects that the DNN model has been pirated, he can extract the embedded watermark from the weights of a specific convolutional layer of the suspicious DNN model. If the watermark can be successfully extracted, the ownership of the suspicious DNN model is verified.

Overview of the proposed active intellectual property protection method for deep neural networks (DNN).
-
Authorisation control: distinguishing authorised users from unauthorised users and providing normal model performance only to authorised users.
-
Users' fingerprints generation: generating fingerprints for different authorised users.
-
Users' fingerprints management: distinguishing different authorised users, which includes users' fingerprints allocation and users' fingerprints authentication.
-
Copyright verification: supporting copyright verification for model owner, which includes watermark embedding, watermark extraction and verification.
3.2 Authoriszation control
- (i)
An adversarial example is assigned to an authorised user as his fingerprint/identity. The proposed ActiveGuard method generates each adversarial example based on a specific class t with a fixed confidence c. In this way, each adversarial example is unique, and thus can represent the unique identity of each authorised user.
- (ii)
Users submit their fingerprints (i.e. adversarial examples) to the DNN model for identity authentication before using the DNN model. Specifically, we design a control layer and add it to the end of the DNN model to restrict the usage of unauthorised users on the DNN model.
- (iii)
For authorised users with legal fingerprints, the ActiveGuard will reload the DNN model, and the added control layer will be automatically removed. As a result, the authorised users can use the DNN model normally. However, for unauthorised users, the DNN with the control layer will output randomly predicted results thus leading to poor model performance.

The process of active authorisation control.
-
Q1: How does the DNN model distinguish between authorised and unauthorised users?
-
Q2: How to generate a unique adversarial example that represents the identity of each user?
-
Q3: How does DNN distinguish between different authorised users?
In this section, we discuss how to solve the issue Q1. The solutions to issue Q2 and Q3 will be presented in Sections 3.3 and 3.4, respectively.
The proposed ActiveGuard exploits the difference of confidences to distinguish authorised users from unauthorised users. In general, for a high-performance DNN model, the clean inputs will be classified as ground-truth labels with high confidences, while adversarial examples will be classified as their target classes with high confidences. In other words, very few inputs will be classified as a class t with a low confidence (below 0.50). Inspired by the above observation, this paper utilises the low confidence interval (ranges from 0.10 to 0.50) to achieve active authorisation control. In other words, rare and specific adversarial examples are generated and used as users' fingerprints, which are not only difficult to forge, but also can ensure that common adversarial examples will not accidently pass the fingerprint authentication. First, we select some fixed confidences (such as 0.20, 0.30 and 0.40) from the low confidence interval [0.10, 0.50), and use these selected confidences to construct a set Cfp, that is, Cfp = {0.20, 0.30, 0.40}. Second, the set Cfp is utilised to distinguish the authorised users from unauthorised users. Specifically, we generate such adversarial examples that are classified by the DNN model as the specific target classes, while their classification confidences are all in the set Cfp. In this way, when a user submits his fingerprint (i.e. adversarial example) to the protected DNN model, he would be regarded as an authorised user if his fingerprint is classified as a specific target class with a confidence in the set Cfp. Otherwise, the user is considered to be an unauthorised user.
In addition, to achieve authorisation control, we design a control layer based on the Lambda Layer [25], and add the control layer to the end of the DNN model. The control layer involves several control conditions and tensor (multi-dimensional vector) operations, and the implementation of the control layer is as follows: (i) Receive the predicted class and the confidence vector propagated by the output layer, and extract the highest confidence (i.e. the confidence of the predicted class) in the confidence vector. (ii) Calculate the errors between the highest confidence and each confidence in the set Cfp. The minimal error among these calculated results is denoted as Ec. If Ec is less than the tolerable error, the user is considered to be an authorised user. Otherwise, the user is considered to be an unauthorised user. (iii) Output a randomly predicted result to the unauthorised user. For authorised users, the model will be reloaded automatically and the added control layer will be removed to provide normal model performance.
3.3 Users’ fingerprints generation
In this section, we describe how to generate users' fingerprints (i.e. specific adversarial examples). Formally, the protected DNN has K classes and T assignable confidences (i.e. the set Cfp has T elements). The set of K classes is represented by Lfp = {0, 1, 2, …, K − 1}, and the set of T confidences is denoted as Cfp = {c1, c2, …, cT}. In this way, a total of K × T fingerprints can be assigned to users. The fingerprint of a user is denoted as f, and all the K × T fingerprints constitute the fingerprint database FP, that is, FP = {f1, f2, …, fK × T}.
-
The generated adversarial example should be classified as the target class t with a fixed confidence c by the model M, where t is a class randomly selected from the set Lfp and c is a confidence in Cfp.
The g0(x′) ensures that the generated adversarial example x′ is classified as the class t. The confidence control term ‖Z(x′)t − c‖ guarantees that the confidence on target class t is close to the predefined value c, where c is a legal confidence in the set Cfp.
The proposed ActiveGuard method generates effective adversarial examples by minimising the above optimisation function (Equation 1) with the g(⋅) defined in Equation (3). Figure 3 presents some adversarial examples that generated from MNIST [14] and CIFAR-10 [15] datasets, respectively. For the MNIST dataset (as shown in Figure 3a), the target class of these generated adversarial example is ‘4’ (the first row) and ‘5’ (the second row), while the confidence output by the DNN model is 0.40 and 0.50, respectively. For the CIFAR-10 dataset (as shown in Figure 3b), the target class of these generated adversarial example is ‘0’ (the first row) and ‘9’ (the second row), while the confidence output by the DNN model is 0.20 and 0.30, respectively.

Some examples of users' fingerprints that generated from MNIST and CIFAR-10 datasets. (a) Adversarial examples generated from the MNIST dataset. The adversarial examples in the first row are classified by the protected deep neural networks as class ‘4’ with confidence 0.40, while the adversarial examples in the second row are classified as class ‘5’ with confidence 0.50. (b) Adversarial examples generated from the CIFAR-10 dataset. The adversarial examples in the first row are classified as class ‘0’ (i.e. ‘Airplane’) with confidence 0.20, while the adversarial examples in the second row are classified as class ‘9’ (i.e. ‘Truck’) with confidence 0.30.
3.4 Users' fingerprints management
Next, we discuss how to distinguish different authorised users, that is, users' fingerprints management. To this end, the proposed ActiveGuard ensures that each authorised user is assigned a unique fingerprint (i.e. a specific adversarial example). The users' fingerprints management includes two parts: users' fingerprints allocation and users' fingerprints authentication, which will be discussed as follows.
3.4.1 Users' fingerprints allocation
We assign a unique fingerprint to each user based on the class and the confidence. In other words, the fingerprint of a user will be uniquely determined by a class t with a confidence c. The legal combination of class t and confidence c is denoted as fingerprinting output (FO), that is, FO = {(t, c)|t ∈ Lfp, c ∈ Cfp}. Therefore, the users' fingerprints allocation can be implemented by assigning users with different FOs, as follows: the FO = (i, cj) is assigned to the (i ⋅ T + j)-th user, where i ∈ {0, 1,…, K − 1}, j ∈ {1, 2, …, T}.
The process of users' fingerprints allocation is illustrated in Figure 4. First, according to all K class labels and T predefined confidences, a total of K × T legal FOs are obtained. Second, users' fingerprints are generated based on the above FOs. Last, the generated fingerprints are allocated to authorised users where each authorised user will acquire a fingerprint. For the protected DNN model, the authorised user with fingerprint fi⋅T + j will be classified as class i with confidence cj.

The process of users' fingerprints allocation.
3.4.2 Users' fingerprints authentication
- (i)
Determine the legitimacy of each user. Given a DNN model M and an input x, the output confidence of M is a K-dimensional vector P(x), where P(x) = {P(x)1, P(x)2, …, P(x)K}. The highest confidence in P(x) is denoted as Pmax(x), where Pmax(x) = max{P(x)1, P(x)2, …, P(x)K}. The output class with the highest confidence is denoted as M(x). For a user u with the fingerprint f, if M(f) ∈ Lfp and Pmax(f) ∈ Cfp, the user authentication is considered to be successful, and the DNN will treat this user u as an authorised user. For a user with the fingerprint , if or , the authentication fails and the user is regarded as an unauthorised user.
- (ii)
Determine the identity of each authorised user. Each user's fingerprint is determined by a unique adversarial example, which is classified by the DNN model as the class M(f) (i.e. target class t) with the confidence Pmax(f) (i.e. confidence c). Based on the above prediction result (M(f), Pmax(f)), the identity of each authorised user can be determined.
There will be a slight error between the output confidence Pmax(f) and c during the users' fingerprints authentication, that is, Pmax(f) ≈ c, where c is a legal confidence in Cfp. We denote the tolerable error of confidence as ɛ. If the confidence Pmax(f) output by the DNN model is within the error range (c − ɛ, c + ɛ), the Pmax(f) is considered to be matched with c, that is, the user passes the identity authentication.
3.5 Copyright verification
Finally, we introduce how to verify the ownership of DNN models. Note that, the adversarial example based method is difficult to distinguish the model owner from the authorised users. Therefore, we propose a watermarking scheme for copyright verification.
Inspired by the watermark embedding method in work [1], this paper embeds an n-digits watermark into the weights of a convolutional layer of DNN for copyright verification. Compared to the watermarking method in the work [1], our proposed watermark embedding method has two significant advantages: (i) First, the watermark in work [1] consists of binary strings (0/1), where each bit of a watermark can only be embedded with two different digits (0 or 1). The proposed method extends the form of watermarking to numerical digits 0–9 through linear mapping. This allows each bit of a watermark to be embedded with 10 different digits (i.e. 0–9), which greatly improves the capacity of watermark embedding. (ii) Second, the proposed method can embed the watermark discretely, that is, the watermark can be embedded into discontinuous weights of a DNN model, while the watermark in work [1] is embedded in consecutive positions. As a result, our embedded watermark is stealthier (more difficult to be noticed) and more flexible.
The process of the proposed copyright verification method includes the following two phases: watermark embedding, watermark extraction and verification.
3.5.1 Watermark embedding
First, to provide ownership verification, the proposed ActiveGuard embeds an n-digits watermark wm = (d1, d2,…, dn) into the weights of a DNN model M. In this paper, the watermark is embedded into a specific convolutional layer of the model. For a DNN model, the weight matrix of a convolutional layer can be denoted as a 4-dimensional tensor D = (F, F, I, O), where F is the size of the convolution kernel, and I and O are the numbers of input channels and output channels respectively [1, 8]. In this paper, we embed the watermark into the maximum component among all O components of tensor D, where each component is a tensor in the form of (F, F, I). In this way, a total of F × F × I positions are available for watermark embedding, and the weights at these m (m = F × F × I) positions are denoted as a vector w. We define another weight vector v to represent the weights at the n (n < m) randomly selected positions where the watermark is embedded.
- (i)
Extract the weights from the original/clean DNN model and calculate the range(w) of vector w. The range(⋅) is used to find the minimal value wmin and maximal value wmax in the vector w, and return a range [wmin, wmax]. For example, if w = (0.18, 0.30, 0.16, 0.18, 0.22, 0.34), then the range(w) = [0.16, 0.34], that is, the wmin is 0.16 and the wmax is 0.34.
- (ii)
Determine the constant a and b of function map(⋅). The map(⋅) can map the above range(w) to numerical digits [0, 9], that is, map(range(w)) = [0, 9]. For example, if range(w) = [0.16, 0.34], the constant a and b can be calculated by solving the equation [0.16, 0.34]T ⋅ a + b = [0, 9]T, and the solution of the equation is: a = 50, b = −8. In this way, the watermark mapping function map(⋅) is d = 50h − 8.
In conclusion, the watermark is embedded through modifying the values of vector w so that the specific n-digits watermark is embedded into the weights of a specific convolutional layer. The overall process is summarised as follows: (i) n different positions are randomly selected from the above m (F × F × I) positions in the vector w for watermark embedding. The n positions are denoted as a vector p (p = (p1, p2, …, pn)), while the weights at the selected n positions are represented by a vector v = (v1, v2, …, vn). (ii) The mean square error [26] (in the second term of Equation 4) is used to calculate the distance between the watermark vector wm and the vector map(v). (iii) By optimising the loss function (Equation 4), the values in vector map(v) is going to approach the values in watermark vector wm. In this way, the watermark wm can be successfully embedded into the weights of a specific convolutional layer of the target DNN (through a linearly amplified mapping).
3.5.2 Watermark extraction and verification
When the model owner suspects that his DNN model is pirated, he can extract the embedded watermark from the suspected DNN and verify the ownership.
-
Step 1. The model owner obtains the parameters of the suspected model M′, that is, the target convolutional layer l, and the weights Dl of the layer l (Line 1 in Algorithm 1).
-
Step 2. The model owner extracts the weights v of target convolutional layer l at the positions p where the watermark is embedded (Line 2 in Algorithm 1).
-
Step 3. The model owner exploits the function map(⋅) to map these extracted weights to the watermark digits, and compares the mapping result wvp with the watermark wm. If the wvp is consistent with wm, the model M′ is considered to be a pirated model. Otherwise, the model M′ is not a pirated model (Lines 3–9 in Algorithm 1).
In Step 3, since the values of weights are floating-point numbers, we round each digit of the wvp to an integer. The rounding operation can be expressed by the Equation wvp = Round(wvp).
Algorithm 1. Watermark extraction and verification
-
Input: Suspected model M′; target convolutional layer l; positions of embedded watermark p; watermark mapping function map(⋅); the model owner's watermark wm.
-
Output: Verified result R.
-
1: Dl ← Get_Target_Layer_Weights(M′, l);
-
2: v ← Get_Embedded_Weights(Dl, p);
-
3: Process the weights of target convolutional layer:
-
wvp = map(v);
-
wvp ← Round(wvp);
-
4: if Equal(wm, wvp) == True then
-
5: R = True;
-
6: else
-
7: R = False;
-
8: end if
-
9: return R.
4 EXPERIMENTAL RESULTS
In this section, we evaluate the effectiveness and robustness of the proposed ActiveGuard method on the LeNet-5 [16] and the WRN [17] models. The three functions of ActiveGuard, that is, authorisation control, users' fingerprints management and copyright verification, are evaluated in Sections 4.2-4.4, respectively. Further, in Section 4.5, we demonstrate that the proposed ActiveGuard method is robust to three types of fingerprint forgery attacks and two watermark removal attacks (i.e. fine-tuning [27, 28] attack and pruning [29] attack). Lastly, the proposed ActiveGuard method is compared with existing active DNN copyright protection methods in Section 4.6.
4.1 Experimental setup
The experiments are implemented in Python 3.7 with Keras [25] and Tensorflow [30] platforms.
4.1.1 Datasets
- -
MNIST dataset [14]. The MNIST dataset is a handwritten digit recognition dataset, which contains 60,000 training images and 10,000 test images. There are a total of 10 different classes in the MNIST dataset: {0, 1, 2, …, 9} [14]. Each image is a grey image with the size of 28 × 28.
- -
CIFAR-10 dataset [15]. The CIFAR-10 dataset consists of 50,000 training images and 10,000 test images, which contains 10 different classes [15]. Each CIFAR-10 image is a 3-channel colour image with the size of 32 × 32.
4.1.2 DNN models
We train the LeNet-5 [16] model on the MNIST [14] dataset, and train the WRN [17] model on the CIFAR-10 [15] dataset, respectively. The proposed ActiveGuard aims to embed the watermark into a specific convolutional layer of the DNN model. The LeNet-5 model has two convolutional layers, and the WRN model has four convolutional layers. Compared to the LeNet-5 model trained on the MNIST dataset (grey, single-channel), the WRN model contains more convolutional layers and the CIFAR-10 images are more complex (coloured, three-channel). Therefore, in our experiments, the LeNet-5 model is trained on the MNIST dataset for 50 epochs, while the WRN model is trained on the CIFAR-10 dataset for 200 epochs. For the LeNet-5 model, we use the Adam [31] optimiser and categorical cross-entropy loss to train the model. For the WRN model, we follow the training setting in existing works [1, 17], and use the SGD optimiser with momentum and categorical cross-entropy loss to train the model.
4.1.3 Users' fingerprints settings
For users' fingerprints generation, we adopt the C&W [13] method to generate the specific adversarial examples. For the MNIST [14] dataset, the learning rate of the optimiser is set to be 0.005. The range of the constant α is 0–40, and the initial α is set to be 20. For the CIFAR-10 [15] dataset, the learning rate of the optimiser is set to be 0.001. The range of the constant α is 0–1, and the initial α is set to be 0.5.
In our experiments, K = 10 (10 classes), and the tolerable error ɛ of the confidence is set to be 0.01. Therefore, if the confidence interval for authorised users is [0.10, 0.50), the proposed ActiveGuard can support up to 200 (10 × (0.50 − 0.10)/0.02) authorised users. Specifically, under the above setting, the candidate confidence set is {0.10, 0.12, 0.14, 0.16, …, 0.46, 0.48}, from which we can choose some confidences to construct the set Cfp. For the sake of simplicity, we only choose three confidences (0.20, 0.30, 0.40) to construct the set Cfp in the experiments. In this way, Cfp = {0.20, 0.30, 0.40}, K = 10, and T = 3. As a result, ActiveGuard assigns fingerprints for 30 authorised users in the experiments.
4.1.4 Watermarking settings
As discussed in Section 3.5, the watermark is embedded into a convolutional layer of DNN. For the LeNet-5 model and the WRN model, the number of convolutional layers is 2 (conv 1 and conv 2) and 4 (conv 1, conv 2, conv 3, and conv 4), respectively, where the conv l represents the lth convolutional layer of the DNN. Table 1 shows the structure and the maximum length of the embedded watermark at each convolutional layer of the two DNNs. As discussed in Section 3.5, the weight structure of the weight matrix D is (F, F, I, O), and the maximum watermark length is calculated by F × F × I. For example, we can embed at most 150 (5 × 5 × 6) digits into the weights at conv 2 layer (D = (5, 5, 6, 16)) of the LeNet-5 model [16].
In our experiments, the length of the watermark is set to be 13, that is, the watermark contains 13 digits. The watermark is embedded at the position p in the conv 2 layer, where p is randomly selected from all 150 (LeNet-5) and 576 (WRN) positions in the conv 2 layer. Additionally, the range of weight at the conv 2 layer of the LeNet-5 model is [0.10, 0.45]. Therefore, the aforementioned watermark mapping function is calculated by equation [0.10, 0.45]T ⋅ a + b = [0, 9]T, where the solution of the above equation is a = 180/7 and b = −18/7. In this way, the watermark mapping function is d = (180/7)h − (18/7). Similarly, for the WRN model [17], the range of weight at the conv 2 layer is [0.20, 1.10], thus the watermark mapping function is calculated by equation [0.20, 1.10]T ⋅ a + b = [0, 9]T. The solution of the equation is a = 10 and b = −2, thus the watermark mapping function is d = 10h − 2. We follow the settings in the work [1] to set the constant λ (in Equation 4) to be 0.01.
In addition to the above experimental settings, in Section 4.4, we further evaluate the effectiveness of the proposed ActiveGuard method in terms of the following aspects: (i) the length of watermark is set to other values; (ii) the watermark is embedded into different convolutional layers.
4.2 Authorisation control performance
As discussed in Section 3.2, ActiveGuard allows authorised users to use the DNN model normally while restricting unauthorised users. Figure 5 shows the test accuracy for authorised usage and unauthorised usage. The test accuracy for authorised users is 99.15% on MNIST [14] and 91.46% on CIFAR-10 [15] datasets, respectively. However, the performance of the unauthorised usage is much lower, as the test accuracy is only 8.92% on the MNIST dataset and 10% on the CIFAR-10 dataset. Therefore, the proposed ActiveGuard method can achieve active authorisation control, and can effectively prevent DNN models from being used without authorisation. The reason why the test accuracy for unauthorised users is close to 10% is as follows. As mentioned in Section 3.2, the control layer of the DNN outputs a randomly chosen class to the unauthorised users. Hence, the test accuracy for the unauthorised users is equivalent to the accuracy of randomly guessing a class from all K classes, which is about in statistics. There are 10 classes in both MNIST and CIFAR-10 datasets, thus the test accuracy for unauthorised users is around 10%.

The test accuracy for authorised usage and unauthorised usage on MNIST and CIFAR-10 datasets.
4.3 Users' fingerprints management performance
The proposed ActiveGuard method assigns a unique adversarial example to each user as his fingerprint. As discussed in Section 4.1, there are 10 different class labels (Lfp = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}) and 3 different confidences (Cfp = {0.20, 0.30, 0.40}) in the experiments. In this way, there are a total of 30 (10 × 3) FOs. Each FO is a combination of class label t and confidence c, and each FO corresponds to an authorised user. Note that, in our experiments, to evaluate the effectiveness of the proposed method, we generate 100 different fingerprints for each authorised user. In other words, for each combination (t, c), we generate 100 different adversarial examples as the fingerprints, and then submit these 100 fingerprints to the DNN model to calculate the authentication success rate of an authorised user. However, when the DNN model is deployed in real world, the model owner only requires to generate one adversarial example for each authorised user, and each user only needs to submit one fingerprint for identity authentication.
Table 2 reports the fingerprint authentication success rate of 30 authorised users, where the fingerprint of each user is represented by a combination (t, c). For the MNIST dataset [14], the fingerprint authentication success rate of 30 users is 96% at the lowest, 100% at the highest, and 15 users achieve the success rate of 100%. For the CIFAR-10 dataset [15], the minimum and maximum fingerprint authentication success rates of 30 users are 99% and 100%, respectively, and 27 users achieve the success rates of 100%. It is shown that, all authorised users can successfully pass the identity authentication with a high success rate. In ActiveGuard, an authorised user only requires one specific adversarial example for identity authentication, which is efficient. In addition, it can be inferred from Table 2 that, ActiveGuard can effectively generate a unique adversarial example as the fingerprint of each authorised user. Note that, in real life, 96%–100% or 99%–100% of the fingerprint authentication success rates (and most users are 100%) is acceptable and useable. As a reference, in the existing DNN IP protection works, the accuracy of DNN ownership verification methods ranges from around 90% to 100%. For example, in work [7], they embed watermark in the DNN model for remote verification, and the watermark accuracy of the watermarking DNN models ranges from 94.1% to 100%. Yang et al. [32] propose a DNN fingerprinting method to verify the ownership of the DNN model. The accuracy of the fingerprint verification of the DNN model is 99.34% and 97.69% on CIFAR-10 and Tiny-ImageNet datasets, respectively. In addition, in the traditional biometric recognition field, the accuracy of authentication is also lower than 100%. For example, in the work [33], the accuracy of face recognition is 97.35% on the labelled faces in the wild (LFW) dataset. In work [34], the accuracy of face recognition is 99.63% on the LFW dataset and 95.12% on YouTube Faces dataset, respectively.
Dataset | Confidence c | Class label t | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
0 (%) | 1 (%) | 2 (%) | 3 (%) | 4 (%) | 5 (%) | 6 (%) | 7 (%) | 8 (%) | 9 (%) | ||
MNIST | 0.20 | 99 | 99 | 99 | 99 | 96 | 100 | 99 | 99 | 100 | 100 |
0.30 | 98 | 100 | 100 | 100 | 97 | 100 | 98 | 98 | 100 | 99 | |
0.40 | 99 | 100 | 100 | 100 | 97 | 100 | 98 | 100 | 100 | 100 | |
CIFAR-10 | 0.20 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 99 | 100 | 100 |
0.30 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | |
0.40 | 100 | 99 | 100 | 100 | 100 | 100 | 100 | 99 | 100 | 100 |
4.4 Copyright verification performance
In the experiment, we embed a 13-digit watermark wm1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 2, 1, 0] into the model to protect the copyright of the DNN. For watermarks with other lengths, the proposed watermark embedding method is also feasible. We embed the watermark with two different approaches: (i) embedding the watermark by training from scratch; (ii) embedding the watermark by fine-tuning. For the first scenario, the model owner can embed the watermark into the model during the training phase. For the second scenario, the model owner can embed the watermark into a pre-trained model through fine-tuning. This paper defines a metric named Vowner to evaluate the performance of ownership verification. If the watermark is successfully extracted from the target layer of DNN model, Vowner = success, otherwise, Vowner = failure.
Table 3 shows the results of ownership verification. In Table 3, accuracy change denotes that the test accuracy of the watermarked DNN model subtracts the test accuracy of the DNN model without watermark. It is shown that, the watermark can be successfully extracted to verify the ownership of the DNN model, regardless of the two embedding approaches (training from scratch or fine-tuning). Meanwhile, the normal performance of the watermarked DNN is similar to the performance of the DNN without watermarks. The test accuracy of two DNNs without the watermarks is 99.12% (LeNet-5) and 91.38% (WRN), respectively, while the test accuracy of the two watermarked DNNs is 99.15% (LeNet-5) and 91.46% (WRN), respectively. The accuracy change is +0.03% (LeNet-5) and +0.08% (WRN) respectively, which indicates that the test accuracy is not affected by watermark embedding. The reason is that, the DNN model is often over-parameterised [1, 2, 8], which makes the loss function of a DNN model has many local minima. All these local minima can cause the DNN model to have the similar/good accuracy [1, 2, 8], thus the normal performance of the DNN model will not be affected by watermark embedding.
Dataset | Model | Epoch | Test accuracy (%) | Accuracy change | Vowner |
---|---|---|---|---|---|
MNIST | LeNet-5 without watermarks | 50 | 99.12 | N/A | N/A |
Watermarked LeNet-5 (by training from scratch) | 50 | 99.15 | +0.03% | Success | |
Watermarked LeNet-5 (by fine-tuning) | 20 | 99.12 | 0 | Success | |
CIFAR-10 | WRN without watermarks | 200 | 91.38 | N/A | N/A |
Watermarked WRN (by training from scratch) | 200 | 91.46 | +0.08% | Success | |
Watermarked WRN (by fine-tuning) | 30 | 91.38 | 0 | Success |
- Abbreviation: WRN, wide residual network.
Further, we explore the impact of different watermark lengths and different target convolutional layers on the effectiveness of the proposed watermarking based copyright verification method. Specifically, we generate the watermarks with four different lengths (13, 25, 50, 100), and embed them in the convolutional layers of the LeNet-5 and WRN models, respectively. Note that, the conv 1 layer of LeNet-5 can embed 25 digits at most, that is, only the 13-digits and 25-digits watermarks are feasible. The test accuracy of the DNN model when embedded with different lengths of watermarks into different convolutional layers is shown in Figure 6. It is shown that, for the LeNet-5 model [16], the test accuracy of the watermarked DNN ranges from 99.11% to 99.15%, which is consistent with the accuracy of the LeNet-5 model (99.12%) without watermark. For the WRN model [17], the test accuracy of the watermarked DNN ranges from 91.25% to 91.57%, which is also consistent with the test accuracy of the WRN model (91.38%) without watermark. In the meantime, all the values of Vowner are ‘success’, which indicate that all the embedded watermarks can be successfully extracted for ownership verification. This experimental result indicates that the proposed ActiveGuard method can successfully achieve copyright verification with numerical watermarking, regardless of the lengths of watermarks and in which convolutional layer the watermark is embedded. In addition, the watermark embedded through the proposed ActiveGuard method is very concealed. The reason is that, the watermark can be flexibly embedded in different layers and different weights (discrete positions) of a target layer. Hence, even if an adversary knows that the DNN is watermarked, it is still difficult for him to find out which layer and which weights the watermark is embedded in.

The test accuracy of deep neural networks when embedded with different lengths of watermarks into different convolutional layers. (a) LeNet-5 (b) Wide Residual Network (WRN).
4.5 Robustness of the proposed ActiveGuard method
In this section, the robustness of the proposed method against different attacks are evaluated, including three fingerprint forgery attacks and two watermark removal attacks (fine-tuning [27, 28] attack and pruning [29] attack).
4.5.1 Fingerprint forgery attacks
In the real world, an attacker may attempt to pass the authentication by leveraging a forged fingerprint, which is referred as fingerprint forgery attack in this paper. Specifically, we consider three different fingerprint forgery attacks: (i) using clean images as fake fingerprints (FPclean); (ii) using fake fingerprints generated by using the FGSM [12] method (FPFGSM); (iii) using fake fingerprints generated by using the C&W [13] method (FPCW). The FGSM [12] and C&W [13] methods are two popular adversarial example generation methods. For each of the above three attacks, we select 10,000 clean images from the test set of the MNIST [14] (CIFAR-10 [15]) dataset to generate the fake fingerprints. For FPFGSM and FPCW attacks, the fake fingerprints are generated by exploiting the cleverhans [35] tool. The fingerprint forgery attack is considered to be successful if a forged fingerprint passes the identity authentication. Otherwise, the attack fails. The attack success rate is the proportion of passed fake fingerprints among all the generated forged fingerprints.
Table 4 shows the attack success rates of three fingerprint forgery attacks on the MNIST [14] and CIFAR-10 [15] datasets. It is shown that, the attack success rates of the three forgery attacks are all lower than 0.1%. The reason is that, this paper exploits the rare and specific adversarial examples as the unique fingerprint of each user, and the fingerprint will be classified by the DNN model as the target class t with a fixed confidence c. Therefore, for an adversary who doesn't know the protection mechanism and parameter settings at all, he cannot construct such a user's fingerprint that satisfies the above condition. Besides, the fixed confidence c ranges from 0.10 to 0.50. Under normal circumstances, there are very few inputs that will be classified as a class with this low confidence interval (0.10–0.50). In conclusion, the proposed ActiveGuard method can effectively resist different fingerprint forgery attacks.
Dataset | Fingerprint forgery attack | Attack success rate (%) |
---|---|---|
MNIST | FPclean attack | 0.01 |
FPFGSM attack | 0.10 | |
FPCW attack | 0.01 | |
CIFAR-10 | FPclean attack | 0.01 |
FPFGSM attack | 0.02 | |
FPCW attack | 0.05 |
4.5.2 Watermark removal attacks
Further, we evaluate the robustness of the proposed ActiveGuard method against two watermark removal attacks: model fine-tuning [27, 28] attack and model pruning [29] attack.
Model fine-tuning [27, 28] attack. The adversaries can exploit the fine-tuning attack to generate a new model [27, 28]. In our experiments, we select 7000 images from the test set of the MNIST dataset as the training data to fine-tune the watermarked LeNet-5, and select 7000 images from the test set of CIFAR-10 dataset as the training data to fine-tune the watermarked WRN. The remaining 3000 test images of the MNIST and CIFAR-10 datasets are used as the test data for LeNet-5 and WRN, respectively. We use the SGD optimiser with a learning rate of 0.001 to perform fine-tuning attacks. The fine-tuning attack is performed on each watermarked DNN (LeNet-5 [16] and WRN [17]) for 30 and 50 epochs, respectively.
Table 5 shows the test accuracy and ownership verification results of the watermarked DNN under the fine-tuning attack. It can be seen that, before and after different epochs of fine-tuning attacks, the test accuracy of the watermarked DNN is consistent, and the ownership of the watermarked model can be successfully verified. Specifically, after 30 and 50 epochs of fine-tuning, the embedded watermark in the LeNet-5 and WRN models can still be successfully extracted. In conclusion, the proposed ActiveGuard method is robust against fine-tuning attacks.
Model pruning [29] attack. Pruning is a widely used operation for model compression, which aims to prune the DNN by removing redundant parameters from the neural network [29]. An adversary may try to perform pruning attacks to remove the embedded watermark in a DNN model. In our experiments, we assume that the adversary knows the layer where the watermark is embedded, which is a strong attack assumption. We adopt the pruning method in the work [29] to prune the target layer (i.e. the layer where the watermark is embedded) of the watermarked DNNs (LeNet-5 and WRN). For the target layer, the r% weights with the smallest absolute values are pruned, where r% is the pruning rate. The pruned weights are set to 0. As discussed in Section 4.4, the watermark embedded into the LeNet-5 and WRN models is wm1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 2, 1, 0].
Model | Attack | Test accuracy (%) | Vowner |
---|---|---|---|
LeNet-5 | None | 99.15 | Success |
Fine-tuning attack (30 epochs) | 99.53 | Success | |
Fine-tuning attack (50 epochs) | 99.53 | Success | |
WRN | None | 91.46 | Success |
Fine-tuning attack (30 epochs) | 91.47 | Success | |
Fine-tuning attack (50 epochs) | 91.53 | Success |
- Abbreviation: DNN, deep neural networks.
Table 6 shows the robustness of the watermarked DNN against the pruning attack. First, as the pruning rate increases, the test accuracy of the LeNet-5 and WRN models decreases sightly. However, even as 60% of the weights in watermarked DNNs are pruned, the test accuracy of the two models still maintains at 98.54% (LeNet-5) and 89.87% (WRN), respectively. Besides, the embedded watermark performs well in terms of ownership verification. Specifically, even as 90% weights are pruned, the watermark embedded in the WRN model can still be successfully extracted. However, the ownership verification fails when 50% weights in LeNet-5 model are pruned. The reason is that, the LeNet-5 [16] model is smaller with fewer parameters. Therefore, when pruning, the smaller values in the watermark (e.g. the weights mapped to watermark digits 0, 1, 2) are more likely to be pruned. As a result, the watermark is affected when the pruning rate increases to 50%. Since the weights with small values will be pruned in the model pruning attack, we also evaluate the robustness of the watermarked LeNet-5 model against the pruning attack with a watermark with larger values. Specifically, the watermark wm2 = [3, 8, 7, 6, 8, 7, 6, 9, 9, 4, 8, 6, 5] is embedded into the LeNet-5 model for evaluation. As shown in Table 6, for the LeNet-5 model embedded with wm2, even as 90% weights are pruned, the watermark wm2 embedded in the model can still be successfully extracted. The reason is that, in model pruning attack, the weights with small values are set to 0. Therefore, a watermark with large values will not be pruned. In fact, when 90% weights are pruned, only the weights corresponding to watermark digits 0, 1, 2 are pruned, while the embedded watermark wm2 is not affected. Therefore, even for a small DNN model, embedding a watermark that does not contain small values (e.g. 0, 1, 2) can still effectively resist pruning attacks. Overall, the proposed ActiveGuard method is robust against pruning attacks.
Pruning rate (%) | LeNet-5 (embedded with wm1) | LeNet-5 (embedded with wm2) | WRN (embedded with wm1) | |||
---|---|---|---|---|---|---|
Test accuracy (%) | Vowner | Test accuracy (%) | Vowner | Test accuracy (%) | Vowner | |
0 | 99.15 | Success | 99.16 | Success | 91.46 | Success |
10 | 99.15 | Success | 99.15 | Success | 91.40 | Success |
20 | 99.15 | Success | 99.09 | Success | 91.30 | Success |
30 | 99.07 | Success | 99.08 | Success | 90.99 | Success |
40 | 98.97 | Success | 99.06 | Success | 91.06 | Success |
50 | 98.89 | Failure | 99.00 | Success | 90.28 | Success |
60 | 98.54 | Failure | 98.93 | Success | 89.87 | Success |
70 | 95.80 | Failure | 98.75 | Success | 88.03 | Success |
80 | 91.02 | Failure | 97.46 | Success | 80.40 | Success |
90 | 68.38 | Failure | 85.36 | Success | 48.35 | Success |
- Note: wm1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 2, 1, 0], wm2 = [3, 8, 7, 6, 8, 7, 6, 9, 9, 4, 8, 6, 5].
- Abbreviation: DNN, deep neural networks.
4.6 Comparison with existing active DNN copyright protection methods
Finally, the proposed ActiveGuard method is compared with the existing active DNN copyright protection works [3, 9, 10]. The comparisons are conducted with respect to the following three aspects: active authorisation control, users' fingerprints management, and ownership verification. We directly use the experimental results presented in the three works [3, 9, 10] for comparison. The comparison results are shown in Table 7. The test accuracy of these DNN models without any copyright protection techniques are high (about 90% or even higher) in the existing works [3, 9, 10] and this paper.
Active works | Datasets | Accuracy for clean DNN (%) | Authorisation control (accuracy) | Users' fingerprints management | Ownership verification | |
---|---|---|---|---|---|---|
Authorised users | Unauthorised users | |||||
Chen and Wu [3] | MNIST | 99.12 | 99.23% | 0.23% | No | No |
CIFAR-10 | 90.74 | 90.61% | 0.78% | |||
Fan et al. [9] | CIFAR-10 | 91.12 | 90.89% | Around 10% | No | Yes |
Chakraborty et al. [10] | Fashion MNIST | 89.93 | Around 89.93% | 10.05% | No | No |
CIFAR-10 | 89.54 | Around 89.54% | 9.37% | |||
ActiveGuard | MNIST | 99.12 | 99.15% | 8.92% | Yes | Yes |
CIAFR-10 | 91.38 | 91.46% | 10% |
- Abbreviation: DNN, deep neural networks.
As shown in Table 7, the proposed ActiveGuard method and the existing active DNN copyright protection methods can all achieve the function of active authorisation control. Specifically, in this paper, for authorised users, the test accuracy of two protected DNNs (LeNet-5 [16] and WRN [17]) is high up to 99.15%, which is higher than the existing works [9, 10], and is similar to the work [3]. In the meantime, for unauthorised users, the test accuracy of two protected DNNs (LeNet-5 [16] and WRN [17]) is as low as 8.92%, which is similar to the works [9, 10]. In the work [3], the test accuracy for unauthorised users is lower than works [9, 10] and our proposed method. The reason is that, the work [3] designs a special loss function to train the target DNN model so as to make the target model have a low test accuracy for unauthorised users. Overall, the proposed ActiveGuard method performs well in distinguishing authorised users from unauthorised users. Furthermore, as shown in Table 7, our ActiveGuard method can achieve the functionalities of users' fingerprints management and ownership verification. Compared to these existing active DNN copyright protection methods [3, 9, 10], this paper is the only work that achieves users' fingerprints management. The existing works [3, 9, 10] do not support users' fingerprints management, which makes them cannot meet the requirements of commercial applications. The DNN copyright protection methods [3, 10] also cannot verify the ownership of DNN models. Similar to our work, the method in work [9] can verify ownership. Nevertheless, the work [9] needs to add multiple passport layers to the model, which introduces high complexity and high overhead. Besides, the work [9] is vulnerable to reverse-engineering attacks and tampering attacks. In contrast, the proposed ActiveGuard can embed watermark in discrete positions of the embedded layer, which is more flexible and stealthy.
5 CONCLUSION
DNN copyright protection is a cutting-edge area, which is still in its infancy. In this paper, an active IP protection technique for DNN via adversarial examples based user fingerprinting is proposed. The proposed ActiveGuard method protects the IP of DNN in three aspects: active authorisation control, users' fingerprints management, and copyright verification. Specifically, the proposed method exploits the adversarial example for users' identities management, in which a special adversarial example is regarded as a unique fingerprint of each user. In this way, only the authorised user with his allocated adversarial example can pass the identity authentication for normal usage. Besides, a watermark is embedded into the weights of the DNN model for ownership verification. Most of the existing works are passive verification methods, while this work can provide active copyright protection and copyright management for DNN. Compared to existing few active DNN IP protection works, the proposed method can provide two more functions (users' fingerprints management and ownership verification), which are essential for practical commercial applications, while introduces lower overhead. Experimental results demonstrate that the proposed ActiveGuard method can achieve authorisation control and users' fingerprints management via unique adversarial examples. Meanwhile, the ActiveGuard is able to embed a discrete and stealthy watermark for copyright verification without affecting the normal model performance. Furthermore, the proposed active DNN copyright protection method is demonstrated to be robust against three fingerprint forgery attacks and two watermark removal attacks (model fine-tuning and pruning).
AUTHOR CONTRIBUTIONS
Mingfu Xue: Conceptualization; Methodology; Project administration; Supervision; Writing – review & editing. Shichang Sun: Data curation; Investigation; Resources; Software; Validation; Writing – original draft. Can He: Investigation; Writing – original draft. Dujuan Gu: Methodology; Project administration. Yushu Zhang: Methodology; Project administration; Supervision. Jian Wang: Methodology; Project administration; Supervision. Weiqiang Liu: Methodology; Project administration; Supervision.
ACKNOWLEDGEMENTS
This work is supported by the National Natural Science Foundation of China (No. 61602241) and CCF-NSFOCUS Kun-Peng Scientific Research Fund (No. CCF-NSFOCUS 2021012).
CONFLICT OF INTEREST STATEMENT
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Open Research
DATA AVAILABILITY STATEMENT
The data are available from the corresponding author upon reasonable request.