Modulation scheme recognition using convolutional neural network

: Convolutional neural network (CNN) is an extremely powerful machine-learning tool, especially when dealing with computer vision problems. Here, the authors present a CNN-based modulation recognition model. In order to fully elaborate the powerful image feature extraction ability of CNN, the authors have created an image dataset of different complex signal spectrograms using short-time Fourier transform (STFT). In this case, the complex modulation recognition problem is converted to an image recognition problem. To study the accuracy of automatic recognition of signal spectrograms, the authors have applied two approaches recently developed for image classification. The first approach is to optimise activation functions. Experiments show that best performance can be achieved when using sigmoid as activation function. The second approach is using optimisation functions. At last, the authors compared the recognition accuracy under different signal-to-noise ratios (SNRs). The result shows that authors’ model achieves higher recognition accuracy under low SNR and stronger generalisation ability than other recognition methods.


Introduction
In the research field of non-cooperative communications, modulation recognition technology has always been an important branch. The modulation recognition refers to automatic identification of modulated signals with noise. It also can be used for the applications such as demodulation and feature extraction, which play an important role in military intelligence, electronic warfare, electronic reconnaissance, and other areas [1]. As of that, lots of studies have been accomplished and various algorithms have been introduced by using those techniques. Automatic recognition of analogue and digital modulation can use the artificial neural network (ANN) for classification [2]. In previous studies, higher accuracy in lower SNRs is one of the top priorities for researchers.
In recent years, deep learning algorithms have made a breakthrough in image classification. Convolutional neural network (CNN) has been recently identified as a powerful tool in image classification and voice signal processing [3]. Previous works show that CNN excels in feature extraction. Some researchers in the wireless communications field have started to apply deep neural networks to cognitive radio tasks with some success [4][5][6]. In this work, we propose a new modulation recognition method that refers to the idea of image recognition, that is, using a great number of spectrogram images of different modulation signals obtained by short-time Fourier transform (STFT) as the input of image recognition. Therefore, the difficult modulation recognition problem is converted to an image recognition problem which can be solved by CNN. As of that, complex feature extraction and data reconstruction in traditional recognition algorithms can be avoided.
The remainder of this paper is organised as follows: In Section 2, four different modulations will be introduced. Our CNN model will be presented in detail in Section 3. In Section 4, different activation functions and optimisation functions will be described. In Section 5, simulation results are provided and analysed, followed by a concluding summary in Section 6.

Modulation scheme
Both analogue and digital modulations can be described as the following general expression: where a(t) is the instantaneous amplitude, and ϕ(t) is the generalised instantaneous phase, ω c is the carrier angle frequency. The differences of a(t) and ϕ(t) represent different modulations. The four modulations expressions for this experiment are shown in Table 1. A 0 = 1 represents DC bias quantity, m(t) = cos ω m t is a modulated signal, and ω m is the modulation angle frequency, K f is the FM index. For simplicity, we assume K f = 1, both s 1 (t) and s 2 (t) are the unipolar pulse, and when s 1 (t) is a positive pulse, s 2 (t) is at zero level and vice versa. Here, the baseband signals of FSK and PSK are randomly generated with a sequence length of 10 symbols.
where f 1 and f 2 are the carrier frequencies. φ n represents the absolute phase.

Dataset generation and pre-processing
x(n) represents a real sequence and X w (mS, ω) represents the output of STFT. The variable S is a positive integer, which represents the sampling period of X w (n, ω) in the variable n. The analysis window used in STFT can be denoted as w(n). With little loss of generality, we assume that w(n) is real, L is the length of samples, and nonzero for 0 ≤ n ≤ L − 1. From the definition of the STFT, we have where Table 1 Expressions for modulation types Modulation types Equations Eng And F l [x w (mS, l)] represents the Fourier transform of x w (mS, l) with respect to the variable l. The shape and width of the w(n) have a certain influence on time-frequency resolution of STFT. The time resolution and frequency resolution are constrained by the Heisenberg uncertainty principle. The length of the window determines the time resolution and frequency resolution of the spectrogram [7]. Here, we applied the Hamming window, which has a higher resolution in frequency range, and because the sidelobe attenuation is >42 dB, it has an advantage of less spectral leakage. The highfrequency components in the frequency spectrum are weak and less fluctuating, so a smoother spectrum can be obtained.
Simulation tool is used to generate the training and test dataset. Our dataset consists of the same four modulation signals, which are AM, BPSK, FSK, FM. In order to verify the robustness of our method, we add random additive white Gaussian noise during the simulation where the SNR ranged from −20 to 20 dB and the stride is 5 dB. Each type of signal has 36,000 spectrograms for the training set, 16,800 for the test set.

Convolution neural network
CNN has a special structure for feature extraction [8]. The input of CNN is a 2-D image, and the classification result is given by a probability. CNN is not a fully connected neural network. Instead, there are some key technologies of CNN such as local connection and weight sharing which make the image recognition more compact and invariant. The idea of CNN is closer to the actual biological neural network. Weight sharing reduces the complexity of the network, especially when facing multidimensional input. Our model is based on AlexNet [9], which is a classic and powerful model based on CNN. Our design is modified and improved based on the task of modulation recognition. The network of our CNN model is shown as following: conv1--maxpool--norm1--conv2--norm2---max pool--local3--local4--softmax, in which local3 layer and local 4 layer are fully connected layers. The model structure is shown in Fig. 1.
The input is a 64 × 64 size normalised image. The size of kernels of our model is 3 × 3. After convolutional layer, 2 × 2 max pooling is used for reducing complexity. The pooling process improves the generalisability of a model by merging multiple semantically similar features into one [10]. Local response normalisation (LRN) is used in norm layers, which shows a good performance in solving overfit problem. The objective function categorical cross-entropy was selected for a modulation recognition classification task [11]. Categorical cross-entropy is a measure of the difference between two probability distributions. For deep learning classification task, the probability distribution is usually a softmax of the classifier network which is then converted to onehot encoding for classification purposes [12]. In our work, by using softmax, the category with the highest confidence value can be given as the output category.

Activation functions
In CNN, the convolution operation is linear, and it includes only multiplication and addition. Each layer of the neural network can only do the linear transformation. However, the linear model does not have good performance, so it is necessary to add an activation function to introduce non-linear factors.
In the previous studies of neural networks, the effect of activation functions on the performance of neural networks has received considerable attention [13,14]. Studies have shown that different activation functions lead to different performance. The activation functions usually have the following properties: nonlinear, differentiability, monotonicity, and the output value range can be limited or infinite. We have investigated different activation functions for performance comparison, which is shown in Table 2.
First of all, ReLU function [15] is linear, and the derivative of ReLU is always 1. As of that, the problem of gradient vanishing can be avoided, also, ReLU function can improve the computing speed. AlexNet's paper mentioned that for the same network structure, using ReLU as an activation function, the convergence speed is more than six times faster than using Tanh [13]. Apart from ReLU function, the advantage of Sigmoid function [16] is easy to derive, so the gradient descent algorithm can be implemented at a very low cost. Also, there is Tanh function [17] which is a variant of the Sigmoid function. The difference between Tanh function and Sigmoid function is that Tanh function is the average of 0. Therefore, in practical applications, Tanh function will be better than Sigmoid function. At last, the Softplus activation function [18] is often used for multi-category neural networks.

Optimisation functions
The optimisation of objective function is one of the core issues of machine learning and has always been the focus of attention in the academic area. Here, the most commonly used gradient descent algorithm [19], Adam [20], and RMSProp [21] algorithm are investigated to compare the recognition performance. These algorithms not only improve the global search ability and convergence speed but also reduce the influence of hyperparameters (such as learning rate) on operating efficiency, simplifying the training process.
Gradient descent is simple, general and can be flexibly applied to any differential objective functions. RMSProp optimisation function accelerates the rate of gradient descent. As to Adam optimisation algorithm, it is an extension of the stochastic gradient descent algorithm which can iteratively update neural network weights based on training data. Recently, it is widely used in deep learning applications, especially tasks such as computer vision and   natural language processing. The common feature of RMSProp and Adam is that it can automatically adjust the learning rate.

Results
The result of our research will be presented in this section. Our model was developed using TensorFlow, a well-known framework of neural network. Due to its strong modularity and friendly interface with Python, it greatly accelerates our work in building CNN model.
The recognition accuracy under different activation functions is shown in Fig. 2. Simulation results show that accuracy is close to 100% when SNR ≥12 dB. The results show that using the spectrograms as the dataset trained on the CNN can achieve a great recognition accuracy. When SNR is lower than 4 dB, a better performance can be achieved by using Sigmoid function as activation. At SNR = −10 dB, the recognition performance using Sigmoid function is nearly 10% higher than that of using ReLU function. Accuracy and loss value changes during training process are shown in Figs. 3 and 4, respectively.
The total number of iterations is 80,000. In the first 20,000 iterations, the accuracy rate rapidly increases, and after that, it shows a slight upward trend. The recognition accuracy of signals under the Sigmoid function is bigger than 90% when the number of iterations exceeds 20,000.
During the training process, the loss also gradually converges with the increase of the number of iterations.
The simulation shows that the recognition accuracy of our model using different optimisation functions is over 80% when the SNR reaches −6 dB. When SNR ≥2 dB, the recognition accuracy is close to 100%. The accuracy under three different optimisation functions is nearly equivalent, as shown in Fig. 5, which implies that all those methods can extract enough features reliably. With the increase of SNR, the recognition accuracy improves as is shown in Fig. 6. The recognition accuracy of the red line (ReLU + Adam) and blue line (Sigmoid + Adam) proposed here outperforms green line (CNN) [3] and mauve line (CNN2) [11].

Conclusion
Here, we present a new method of modulation recognition based on CNN. We examined both individual and combined impacts on recognition accuracy with different activation functions and optimisation functions under different SNRs. The simulation shows that the proposed method can not only achieve a good performance but also has strong generalisation ability. More importantly, our model can also identify the modulations beside those four modulations mentioned above. At last, we believe, with all the potentials of our model, better accuracy can be achieved in the future.

Acknowledgments
This work was supported in part by the National Natural Science